Contact Lifestream
        

Lifestream: Upgraded to Wordpress 2.5. Better late than never.




Site: Error logs

I’ve spent the morning finding and correcting anomalous entries in the error log. I can see that some search engine, presumably Google, is extremely conscientious in trying to find stuff that isn’t here, 404 documents and favicons for example. I suspect that could be because I had set <meta name=”robots” content=”all”> but who knows. I’ve set it to “index,follow” now to see if that makes any difference at all.

Here are three clients / spiders for example that clearly overstep logic and end up clogging my error log, making it hard to pick up any relevant information. There are tons of stuff like this:
[...] [error] [client 24.150.192.172] File does not exist: /path/movies/404.shtml
[...] [error] [client 24.150.192.172] File does not exist: /path/movies/favicon.ico
[...] [error] [client 66.249.64.45] File does not exist: /path/movies/robots.txt
[...] [error] [client 207.68.146.40] File does not exist: /path/wap/robots.txt

There were also a number of images missing since the relocation. And a rather puzzling SQL problem with NP_Related:
[...] [error] PHP Warning: mysql_num_rows(): supplied argument is not a valid MySQL result resource in /path/nucleus/plugins/NP_Related.php on line 221

As well as a rare header problem with the gallery:
[...] [error] PHP Warning: Cannot modify header information - headers already sent by (output started at /path/customheader.php:6) in /path/functions.inc.php on line 51
[...] [error] PHP Warning: Cannot modify header information - headers already sent by (output started at /path/customheader.php:6) in /path/theme.php on line 901

Hopefully I’ll be able to monitor and troubleshoot those problems now that all the trivial stuff is out of the way.

In addition …

I’ve now more or less cleaned up the log. Found this overlib file that I misplaced last week and after realizing it consumes almost 100MB of bandwidth alone each month I removed it from the code. I removed the entire calendar while I was at it. Pretty useless stuff.
Also fixed the annoying Nucleus path problem. Not that I understand why buth search spiders do a lot more work than they are supposed to. In my case they disregard the fact that the weblog is actually in the /blog subdir. So I had to fix that with a bit of htaccess code:

RewriteRule ^item/([0-9]+) http://battleangel.org/blog/item/$1 [nc]
RewriteRule ^archive/([0-9]+)/([0-9]+)-([0-9]+)-([0-9]+) http://battleangel.org/blog/archive/$1/$2-$3-$4 [nc]
RewriteRule ^archive/([0-9]+)/([0-9]+)-([0-9]+) http://battleangel.org/blog/archive/$1/$2-$3 [nc]

So now I can access the same day in history through either …
http://battleangel.org/blog/index.php?archive=2005-01-02&blogid=1
or http://battleangel.org/blog/archive/1/2005-01-02/
or http://battleangel.org/archive/1/2005-01-02/

Very useful. I also took the opportunity to add some hotlink protection that actually works in that it also takes into account any subdomains that may exist:

RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(.+)?battleangel?.*$ [nc]
RewriteRule .*\.(gif|jpg|png)$ http://battleangel.org/gfx/hotlinks.gif [nc]