Looking through the logs, I’ve been seeing some strange queries from Yahoo’s crawler recently:
[19/Jul/2004:21:34:09 -0600] "GET /MadonnaCiconne/parcel-problems/mboic.htm HTTP/1.0" [19/Jul/2004:21:43:14 -0600] "GET /ambush/000122.htm HTTP/1.0" [20/Jul/2004:05:21:00 -0600] "GET /sis/000186/favorpopscandy.htm HTTP/1.0" [20/Jul/2004:07:20:38 -0600] "GET /lokalen_pa_nett.htm HTTP/1.0"
It’s like bits and pieces of legitimate paths on my site are getting mixed in with random keywords. Either their crawler has gone a bit bonkers, or some other site out there is making up random links and it’s trying to follow those…
I heard something about cracker-kiddies using crawling and random search input to find security holes, but I don’t recall where I read it…
No, these are all from machines within the Yahoo crawler (Inktomi) address blocks.
I’ve been noticing this activity for a few months. I haven’t been able to come up with a good explanation for it yet.