For various reasons I looked at Cathy's web logs for yesterday. This stood out as something of interest:
69.31.80.114 - - [15/Sep/2006:00:04:00 +0100] "GET /cgi-bin/mt/mt-tb.cgi?__mode=view&entry_id=451 HTTP/1.0" 403 338 "-" "msnbot/0.9 (+http://search.msn.com/msnbot.htm)"
69.31.80.114 - - [15/Sep/2006:00:04:00 +0100] "GET /cgi-bin/mt/mt-tb.cgi?__mode=view&entry_id=478 HTTP/1.0" 403 338 "-" "msnbot/0.9 (+http://search.msn.com/msnbot.htm)"
69.31.80.114 - - [15/Sep/2006:00:04:00 +0100] "GET /cgi-bin/mt/mt-tb.cgi?__mode=view&entry_id=506 HTTP/1.0" 403 338 "-" "msnbot/0.9 (+http://search.msn.com/msnbot.htm)"
69.31.80.114 - - [15/Sep/2006:00:04:00 +0100] "GET /cgi-bin/mt/mt-tb.cgi?__mode=view&entry_id=483 HTTP/1.0" 403 338 "-" "msnbot/0.9 (+http://search.msn.com/msnbot.htm)"
69.31.80.114 - - [15/Sep/2006:00:04:00 +0100] "GET /cgi-bin/mt/mt-tb.cgi?__mode=view&entry_id=858 HTTP/1.0" 403 338 "-" "msnbot/0.9 (+http://search.msn.com/msnbot.htm)"
69.31.80.114 - - [15/Sep/2006:00:04:00 +0100] "GET /cgi-bin/mt/mt-tb.cgi?__mode=view&entry_id=480 HTTP/1.0" 403 338 "-" "msnbot/0.9 (+http://search.msn.com/msnbot.htm)"
69.31.80.114 - - [15/Sep/2006:00:04:00 +0100] "GET /cgi-bin/mt/mt-tb.cgi?__mode=view&entry_id=538 HTTP/1.0" 403 338 "-" "msnbot/0.9 (+http://search.msn.com/msnbot.htm)"
69.31.80.114 - - [15/Sep/2006:00:04:00 +0100] "GET /cgi-bin/mt/mt-tb.cgi?__mode=view&entry_id=389 HTTP/1.0" 403 338 "-" "msnbot/0.9 (+http://search.msn.com/msnbot.htm)"
They seem to do little burst for about a minute or two leeching pages. In total it hit Cathy 6481 times yesterday. Now that msnbot is interesting given that Cathy has a robots.txt telling it not to index, and indeed to Microsoft's credit that IP is nothing to do with them. The netblock is currently assigned to a company called Pilosoft:
Pilosoft, Inc. NLYR-69-31-80-0-1 (NET-69-31-80-0-1) 69.31.80.0 - 69.31.87.255
And just opening up that IP address gives us a 404 on a server called here4search.com which upon looking at the domain name is about what you'd expect - useless junk.
Since Pilosoft's reputation is so good I think 69.31.80.0/21 is just getting blocked from now on. Forging your user-agent, ignoring robots.txt (because it obviously only applies to the real msnbot) and not having a clue how to throttle requests suggest you're either scum or stupid, either way you're not welcome. Since they don't care about collateral damage nor do I, the whole of PIlosoft is offline to me.