The Dark Side Of The Net

You learn an awful lot when you run your own site, and it’s mostly driven by what people try to do to you. This week’s lesson was about ‘dark crawlers,’ web spiders that don’t play by the rules.

Any program that crawls the web for information (mostly search engines), is supposed to follow the robots exclusion standard kept in the robots.txt file. Not every part of a web site is suitable for crawling — for example, the /cgi-bin/mt-comments.cgi… links on my site would just contain redundant information already in the main articles and would place extra load on the server — so you use that file to tell crawlers which parts of your site not to bother loading. You can also use it to stop crawlers that are behaving in ways you don’t like (e.g., polling too often) based on a unique part of their user agent name.

Adherence to that standard is entirely voluntary though, and some crawlers ignore it entirely. Yesterday I noticed a large number of hits coming from two class-C subnets. Although they appeared to be regular web browsers by their user agents (one Mac, one Windows), they were very rapidly working their way through every article on the site, in numerical order. A Google check on the IP address range quickly revealed that they were part of a company called ‘Web Content International’ which is apparently notorious for this kind of dark crawling.

A couple of new firewall rules to drop packets from them stopped their crawling, but then my kernel logs were getting flooded with firewall intrusion notices. They were apparently content to sit there retrying every few seconds for however long it took to get back in. Allowing them back in and having Apache return a 403 Forbidden page to requests from their address ranges instead seemed to finally make them stop.

Is that kind of crawling really that bad? Crawling is a natural web activity now, the site is public information, and these guys probably don’t intend any harm, so it doesn’t seem *too* bad. There are worse offenders out there too, such as dark crawlers that specifically go into areas marked as ‘Disallowed’ in robots.txt even if they weren’t part of the original crawl, hoping to find juicy details like e-mail addresses to spam. Still, if they want to crawl they should be open about admitting they’re doing so by following the robot standards. If they’re going to be sneaky about getting my data, I’m perfectly within my rights in being sneaky in denying it to them.

Nibbles-n-bits-n-bits-n-bits-n-bits-n-…

Okay, having addressed 64-bit hardware the question remains, who cares? What does it actually mean to you?

Well, nothing really. For all the end-user knows the system could be 8-bit or 500-bit or whatever internally, and it doesn’t matter. Linux or Windows or whatever you use will still look, feel, and run the same way. The only people it really makes a big difference to are the programmers, but in a roundabout way it does eventually wind up affecting the end-user.
Continue reading “Nibbles-n-bits-n-bits-n-bits-n-bits-n-…”

Smaller Is Better?

Yeesh, I’m falling behind on all the tech news. I only just found out about the BTX form factor which is supposed to be available fairly soon.

It sounds great on paper, allowing smaller cases, better cooling of components, more silent operation, etc., but already my hands are cringing in fear. Why? Because every time I open one of my cases to work on the internals, I wind up getting cuts on my hands from *somewhere*. I’m still not entirely sure where and I may not even notice until hours later, but something inside those cases is trying to murder me. Or cut my fingers off, at least. Smaller cases are just going to make it even worse!

I don’t think this is what they meant by progress sometimes requiring the sacrifice of blood…

Angst

Dramatic statement about how much life sucks. Complaint about how I just cannot take it anymore.

Description of events of a trivial nature, exaggerated to appear to be of monumental importance. Implication of wrongdoing by an unspecified friend, with blatant clues dropped as to the person’s identity. Vindictive derision of friend’s flaws and lack of loyalty. Gratitude that other friends are not like that person.

Unrelated comments about new element of personal appearance, minor complaint about some other aspect of same. Closing note about how I’ve made up with friend X from last week. Promise to be more upbeat in the next entry.

(Yeah, it’s an old joke, but a goodie.)

‘rm -rf’ Is Mine, Sayeth The Lord

Markov chain generators are interesting; given a body of source text, they analyze the statistical correlations between words and can generate random phrases in a similar style, almost intelligible in some cases even though they’re completely made up.

What happens when you feed it both religious discussion and UNIX development chat? This.

Some highlights:

This is supported by Jesus’s use of low cost eight bit micros and small amounts of RAM.

I have modified the “standard” Berkley ftpd to allow for various types of
failures in Scripture.

On a SVR4, I am interested in building a list of names and
addresses to be in the name of Martin Luther, who led the religious
reformation of the HP Laserjet

I know at one point Jesus said “no one may come to grips with the cpio header blown away”.

(found via the Linux kernel mailing list)

Vidiot

All I wanted to do was convert some clips… I’ve collected a number of little bits of video over the years, but they’re sometimes a bit of a pain to deal with. They’re in a zillion different formats, have to be played back on the computer, are often too small on the monitor and don’t look good when blown up to full screen, and need something a bit faster than my 400 MHz Linux box anyway. (The other, faster one’s often in XP for games.)

Fortunately there’s an easy solution: Video CDs! Simply convert everything to MPEG files, burn them to CD, and I can play them on my DVD player instead.

It turns out that video is a bit more complex than I had anticipated…
Continue reading “Vidiot”

I Hate It When That Happens

There needs to be a word for that situation where you’ve pressed the monitor power button in but haven’t let go and you just realized there was one final thing you needed to do but you don’t want to stress the monitor by powering it off and then right on again so you’re standing there still holding the power button in with one hand while working the mouse and keyboard with the other trying to get those final few clicks in…

You Are All Deranged

Us geeks are a weird lot, and perhaps this helps explain some of it: Five Geek Social Fallacies

I can definitely see aspects of what he describes occuring in places like EverQuest. The whole notion of a guild in these games is an artificial construct put together by people rife with these sorts of problems, so it’s not surprising to see them occur within the guild as well.
Continue reading “You Are All Deranged”

The End of an Era

Alas, goatse.cx is no more. See this thread for more information on what happened behind the scenes. (Warning, disturbing inline images.)

What was it? If you don’t know, then consider yourself lucky. And do *not* visit that link above… If you really still want to know, well, it was a very candid look at a particular man’s rectal elasticity capabilities, to put it delicately (nothing to do with goats)…

What was the big deal then? In short, it was a focus of Internet mischief. It was a common volley in wars between people trying to gross each other out. Pranksters of all types would try to trick people into visiting it, through everything from simple direct links to fool newbies who’d never heard of it, all the way up to elaborate scripting to hide the true url to catch the suspicious and wary. In many circles people are now so suspicious that they carefully check each URL they see posted several different ways before daring to click on it. It was a form of punishment, something you would direct people to after you caught them stealing your image bandwidth. It was also a test of mettle, to see just how jaded you’d become to seeing shocking things on the net.

And now it is gone, at least officially. It will live on in various forms, of course. Many, many people have already ‘experienced’ it and it is one of those images that once seen, cannot be un-seen, and will be forever burned into their minds. And certainly some people will have the image cached away, ready to strike from their own private servers. It just won’t be the same, though.

Oh well, there’s always tubgirl…

Maybe If I Play 24/7…

This is a stack of CDs. CDs containing games. Games which I STILL HAVEN’T FINISHED YET. Some of them I haven’t even *started*.

I never meant to fall so far behind, of course, but then something called EverQuest happened. I still managed to sneak in some time to play other games too, but the vast majority of the time my first impulse was to log into EQ, see if anything was happening, if anyone needed me for a group, etc. All these other games weren’t going anywhere, they’d still be there in the exact same place I saved, but getting things done in EQ often meant being online at the right place at the right time.

Normally that would have just meant that the couple others I was playing at the time would have been the ones to suffer, but of course there’s more than that now. New games were still coming out and I’d hear about them through the usual reviews and word of mouth and felt like they were worth checking out. So, I’d pick up new games, install them and fool around with them for a bit, but…whoops, time to log in for that raid on Chardok tonight. Before long they too would fall to the wayside.

So, now I’ve got this mountainous pile of unfinished games. I haven’t spent as much time in EQ lately so I do have some more time to spend on them, but it’s still going to be pretty slow going. I also still like going back to some of the highly-replayable games like the Civ series and Diablo 2, which doesn’t help work through this pile any.

I’m tempted to say ‘screw it’ to my usual gaming approach and just grab walkthroughs and whip through them all as fast as possible, just to at least enjoy the stories, dialogue, victories, etc…

Watch Those Wildcards

A RAID array may save your data from drive failure, but it won’t protect it from brain farts.

heide:/media/video/anime$ mkdir tmp ; cd tmp
heide:/media/video/anime/tmp$ cp ../bagi_part*
heide:/media/video/anime/tmp$ ls
heide:/media/video/anime/tmp$

Wait, what did I just do…

I just copied the part 1 file over top of the part 2 file, that’s what… Gotta be careful with those wildcard expansions, especially when ‘cp’ and ‘mv’ are involved. (In this case I forgot the ‘.’ parameter at the end to tell it to copy the files here instead of over each other.)

Fortunately I hadn’t wiped the Linux partition on the old system yet and could recopy it back from there. Laziness pays off sometimes…

How Do I Hate Thee? Let Me Count The Ways…

Lotus Notes must die. It was pushed onto us when we got bought out by our parent company and we had to integrate with their e-mail system, and it’s come to be universally loathed around the office. Outlook/Exchange had their own fair share of problems, but at least they were well-known and livable.
Continue reading “How Do I Hate Thee? Let Me Count The Ways…”