I Ain’t No Huffman

In terms of compression efficiency, I knew there were some obvious places that could use improvement. In particular, my Huffman trees…weren’t even really Huffman trees. The intent was for them to be Huffman-like in that the most frequently seen symbols would be closest to the top of the tree and thus have the shortest bitstrings, but the construction and balancing method was completely different. Whenever a symbol’s count increased, I compared it to the parent’s parent’s other child, and if the current symbol’s count was now greater, it swapped it with the current symbol, inserted a new branch where the updated node used to be, and pushed the other child down a level.

Unfortunately, that method led to horribly imbalanced trees, since it only considered nearby nodes when rebalancing, when changing the frequency of a symbol can actually affect the relationship of symbols on relatively distant parts of the tree as well. As an example, here’s what a 4-bit length tree wound up looking like with my original adaptive method:

Lengths tree:
    Leaf node 0: Count=2256 BitString=1
    Leaf node 1: Count=1731 BitString=001
    Leaf node 2: Count=1268 BitString=0001
    Leaf node 3: Count=853 BitString=00001
    Leaf node 4: Count=576 BitString=000001
    Leaf node 5: Count=405 BitString=0000001
    Leaf node 6: Count=313 BitString=00000001
    Leaf node 7: Count=215 BitString=000000000
    Leaf node 8: Count=108 BitString=0000000011
    Leaf node 9: Count=81 BitString=00000000101
    Leaf node 10: Count=47 BitString=000000001001
    Leaf node 11: Count=22 BitString=00000000100001
    Leaf node 12: Count=28 BitString=0000000010001
    Leaf node 13: Count=15 BitString=000000001000000
    Leaf node 14: Count=9 BitString=000000001000001
    Leaf node 15: Count=169 BitString=01
    Avg bits per symbol = 3.881052

If you take the same data and manually construct a Huffman tree the proper way, you get a much more balanced tree without the ludicrously long strings:

    Leaf node 0: Count=2256 BitString=10
    Leaf node 1: Count=1731 BitString=01
    Leaf node 2: Count=1268 BitString=111
    Leaf node 3: Count=853 BitString=001
    Leaf node 4: Count=576 BitString=1100
    Leaf node 5: Count=405 BitString=0001
    Leaf node 6: Count=313 BitString=11011
    Leaf node 7: Count=215 BitString=00001
    Leaf node 8: Count=108 BitString=000001
    Leaf node 9: Count=81 BitString=000000
    Leaf node 10: Count=47 BitString=1101000
    Leaf node 11: Count=22 BitString=110100110
    Leaf node 12: Count=28 BitString=11010010
    Leaf node 13: Count=15 BitString=1101001111
    Leaf node 14: Count=9 BitString=1101001110
    Leaf node 15: Count=169 BitString=110101
    Avg bits per symbol = 2.969368

That’s nearly a bit per symbol better, which may not sound like much but with the original method there was barely any compression happening at all, whereas a proper tree achieves just over 25% compression.

So, I simply dumped my original adaptive method and made it construct a Huffman tree in the more traditional way, pairing the highest count nodes in a sorted list. To keep it adaptive, it still does the count check against the parent’s parent’s other child, and when it crosses the threshold it simply rebuilds the entire Huffman tree from scratch based on the current symbol counts. This involves a lot more CPU work, but as we’ll see later, performance bottlenecks aren’t necessarily where you think they are…

My trees also differ from traditional ones in that they prepopulate the tree with all possible symbols with a count of zero, whereas usually you only insert nodes into a Huffman tree if they have a count greater than zero. This is slightly suboptimal, but it avoids a chicken-and-egg problem with the decoder not knowing what symbol a bitstring corresponds to if it doesn’t exist in the tree yet because it’s the first time the symbol has been seen.

Knowing that, and with the improved Huffman trees, another thing became clear: using Huffman trees for the offsets wasn’t really doing much good at all. With most files, the offset values are too evenly distributed, and many are never used at all, and all those zero-count entries would get pushed down the tree and become longer strings, so the first time an offset got used it would often have a string longer than its basic bit length, causing file growth instead of compression. I instead just ripped those trees out and emitted plain old integer values for the offsets.

The way I was constructing my trees also had another limitation: the total number of symbols had to be a power of two. With the proper construction method, an arbitrary number of symbols could be specified, and that allowed another potential optimization: merging the identifier tree and the literals tree. The identifier token in the output stream guaranteed that there would always be at least 1 wasted non-data bit per token, and often two. Merging it with the literals would increase the size of literal symbols, but the expectation is that the larger literal size would on average still be smaller than the sum of the identifier symbols and smaller literal symbols, on average, especially as more ‘special case’ symbols are added. Instead of reading an identifier symbol and deciding what to do based on that, the decoder would read a ‘literal’ symbol, and if it was in the range 0-255, it was indeed a literal byte value and interpreted that way, but if it was 256 or above, it would be treated as having a following offset/length pair.

The range of offsets to handle would also have to change, but that’s for next time… With the Huffman tree improvements, my 195kB test file that compressed to 87.4kB before now compressed to 84.2kB. Still not as good as gzip, but getting there.

Compressing History

While sorting through some old files of mine, I happened upon the source code to a compression engine I’d written 18 years ago. It was one of the first things I’d ever written in C++, aside from some university coursework, and I worked on it in the evenings during the summer I was in Cold Lake on a work term, just for fun. Yes, I am truly a nerd, but there wasn’t really much else to do in a tiny town like that, especially when you only get 3 TV channels.

Looking at it now it’s kind of embarrassing, since of course it’s riddled with inexperience. No comments at all, leaving me mystified at what some of the code was even doing in the first place, unnecessary global variables, little error checking, poor header/module separation, unnecessary exposure of class internals, poor const correctness, and so on. It kind of irks my pride to leave something in such a poor state though, so I quickly resolved to at least clean it up a bit.

Of course, I have to understand it first, and I started to remember more about it as I looked over the code. It’s a fairly basic combination of both LZ77 pattern matching and Huffman coding, like the ubiquitous Zip method, but the twist I wanted to explore was in making the Huffman trees adaptive, so that the symbols would shift around the tree to automatically adjust as their frequency changed within the input stream. There were two parameters that controlled compression efficiency: history buffer size, and maximum pattern length. The history size controls how far back it would look for matching patterns, and the length controlled the upper limit on the length of a match that could be found.

Compression proceeded by moving through the input file byte by byte, looking for the longest possible exact byte match between the data ahead of the current position and the data in the history buffer just behind the current position. If a match could not be found, it would emit the current byte as a literal and move one byte ahead, and if a match was found, it would emit a token with the offset and length of the match in the history buffer. To differentiate between these cases, it would first emit an ‘identifier’ token with one of four possible values: one for a literal, which would then be followed by the 8-bit value of the literal, and three for offset and length values, with three different possible bit lengths for the offset so that closer matches took fewer bits. Only matches of length 3 or longer are considered, since two-byte matches would likely have an identifier+offset+length string longer than just emitting the two bytes as literals. In summary, the four possible types of bit strings you’d see in the output were:

    | ident 0 | 8-bit literal |

    | ident 1 | 'x'-bit offset    | length |

    | ident 2 | 'y'-bit offset        | length |

    | ident 3 | 'z'-bit offset            | length |

And then I used a lot of Huffman trees. Each of these values were then run through a Huffman tree to generate the actual symbol emitted to the output stream, with separate trees for the identifier token, the literals, the lengths, and the three offset types. HUFFMAN TREES EVERYWHERE! The compression parameters were also written to a header in the file, so the decoder would know what history buffer size to use and maximum length allowed.

It worked…okay… I’ve lost my original test files, but on one example text file of 195kB, my method compresses it down to 87.4kB, while ‘gzip -9’ manages 81.1kB. Not really competitive, but not too bad for a completely amateur attempt either. There’s still plenty of room for improvement, which will come…next time.

Wait, It’s 2011 Already?

The stagnation of my sites kind of reflects the same kind of stagnation in my life; not an awful lot of interest has happened recently, really. There’s nothing terribly wrong in my life, but there’s not enough that’s right. I need to make a to-do list and focus, so in no particular order:

  • Lose Weight
    Yeah, it’s not the first time I’ve said it, and might not be the last, but failing at it doesn’t make the need go away. It’s worse than ever, and it’s undoubtedly tied to a lot of other health and esteem issues.
  • Clean Up The Apartment
    It’s a mess, and I really need to sort through all of my stuff and decide what I really need to keep and what I can throw away. Seriously, I haven’t seen the top of my dining table in months. I have an ISA serial port card in one of these boxes. It’s been over 20 years since I’ve seen a computer that would actually be useful in.
  • Get Presentable
    I need new glasses, clothes that are at least semi-fashionable, to keep up with haircuts more often… It’s not really a matter of giving up and conforming; my current ‘style’ is more the result of sloth and apathy than anything else. It doesn’t really make me feel any particular way, and I’m not gaining anything from it, but it does sometimes feel like people don’t take me as seriously as I’d like. Gotta get that weight down first though…
  • Get A Car
    Man, I don’t even have a driver’s license. It’s never really felt like I’ve had to get one, but as time goes on it does feel like the lack of one is limiting my options more and more, whether it be in determining where I can live, just getting out to see people, traveling without having to tolerate the alternatives, etc.
  • Develop A New Hobby
    Yeah, I’ve always loved computers and video games, but…I admittedly probably spend a bit too much time on them. It’s kind of depressing when I look at another person and wonder what I might have in common with them and the answer winds up being “well, maybe they have a level 75 paladin…” I’ve always kind of wanted to be able to draw at a level slightly beyond stick figures at least, so I need to crack open that art instruction book a friend got me and put some more time into it.
  • Make More Friends
    Not that there’s anything wrong with you guys, of course (assuming anyone other than Google’s bots actually reads this), but without more local friends there’s just too many nights left alone, no reason to get out of the house, no face-to-face meetings to develop relationships further…
  • Focus On Work More
    It feels like I’ve been slacking off a bit too much at the office. Not taking enough initiative, not keeping up with the current trends and techniques, not coding as robustly as I should… I don’t know if a new job is really the answer as the current one is still pretty good, so…I don’t know yet.

I’m 37 years old. I should have had this crap sorted out ages ago.

Spam Watch

Subject: Your elegant watch will keep you warm

What, does it have a space heater built-in?

Subject: You are old enough to have a designer watch

Wait, there are age limits on these things? Is there a pamphlet or legal statute I can check for this?

Subject: No watch will be able to compete with yours

No thanks, the last thing I need is the government trying to declare my wrist a monopoly.

Subject: An elegant watch will give you the wings.

I didn’t know Red Bull made watches now.

Subject: Change your life for better ? get a decent watch.

Just a ‘decent’ one, though. Wouldn’t want to be showy or anything.

Subject: Feel the wonder of having a tremendous instrument

We’re still talking about watches here, right?

Once More Unto The Beach

Okay, it’s not the first time I’ve said this, but I need to get serious about my weight and so I’ve kicked off the status updates in the sidebar again.

It’s not just so that the ladies will swoon over my magnificent abs, but I really need to start being more concerned about my general health. I’ve been feeling terrible lately, it’s only going to get worse if I don’t do anything about it, and I remember how great I felt when I was actually close to a normal weight.

The plan is pretty much the same as before: a reasonable calorie budget (around 1600-1800 per day for now), and more exercise. That means giving up things that had become bad habits like the morning brownie/rice krispie treat, mid-day snacks from the office vending machine, and stopping by the store every day and picking up more food for dinner than I really need. If I have a light breakfast bar in the morning and some simple soup or stew for dinner, that still leaves me enough calories for a decent lunch, so I don’t have to give up all my old habits.

Exercise for now will be limited to walking to work when I can. I hope to eventually get back to small workouts and walks around the neighbourhood again, but that’ll have to wait until…other problems have been fixed.

But It Still Doesn’t Remember Where I Left My Keys

Yesterday the memory upgrade for my laptop arrived and I installed it as soon as I got home, taking it from 2GB to 4GB. Fortunately, upgrading the memory on an MBP is fairly easy, only requiring the removal of three standard screws underneath the battery.

It was mainly meant for a (now postponed) trip so I could use VMware Fusion effectively, but the difference was immediately noticeable when I went to fire up WoW as well. Normally, running WoW on my laptop grinds and chugs and stutters a lot, mainly because I always have Firefox open as well, and together the two just use up too much memory. Now though, it’s smooth as silk, with WoW loading only a little bit slower than it does on my desktop machine.

Aw, I Don’t Get To Go To Court

I received a jury summons a few weeks ago, to be held tomorrow, but I called in today as instructed and was informed that it had been canceled and I was no longer needed. It’s actually kind of a shame; I was a bit curious about the whole process. Maybe I’ll just go sit in on some cases instead.

It was also a bit of a surprise that it had taken this long to get a summons (the first in my life), but I guess we just don’t have enough crime…

About Time

I recently upgraded my server to Ubuntu 9.10, and it finally fixed one thing that had been bugging me ever since I built this system: the audio drivers. The default drivers that came with Ubuntu wouldn’t properly set the line-in volume, so I had to go and get a newer version from Realtek’s site. But, every time there was a system update that refreshed the driver modules, I’d have to reinstall the newer drivers and reboot again. Fortunately, now the default drivers work perfectly fine as of this release, and I’ll hopefully never need to build them separately again.

It also updated MythTV, which was a bit of a surprise and I needed to go get a newer build of the OS X frontend. That took a while to get working because it would just suddenly stop running immediately after launching it, until I figured out that I had to run the main executable directly with a ‘-r’ option to reset which theme it wanted to use.

You Got Your Mac In My Windows!

All of the upgrades and reinstallation are done, and I now have a zippy essentially-new machine running Windows 7.

The most obvious change in 7 is in the taskbar. It uses large icons instead of a small icon and window title, all open windows of the same type are always consolidated into a single entry on the taskbar (previously it would only consolidate them once it started running out of room), and you can pin a running program to the taskbar in order to launch it again later. It’s basically a lot more like OS X’s Dock now.

The Explorer has also changed a bit. There’s no more ‘Explore’ option off the computer icon’s context menu, which is kind of annoying. And they’ve removed the tree view from the Explorer windows (but you can reenable it in the folder options), instead using the sidebar to emphasize a bunch of standard locations like your home directory, your music directory, network servers, etc. Which is also a lot like how Finder works…

Otherwise, things have gone fairly smoothly, and I haven’t really had any problems that I can attribute directly to Windows 7 itself. I still have to poke around and explore what else might be new, though.

Impatience

With Windows 7 still downloading at the office, I decided to do the hardware upgrades tonight, even though I didn’t quite hit my goal of finishing King’s Bounty first.

It actually went a bit smoother than expected, with only two major hiccups. The first was when I went to install a new 120mm fan to improve airflow, when I suddenly realized that I didn’t know which direction it was supposed to face… Fortunately it’s the same style as a couple other fans in the system, so I was able to deduce the direction from that, and confirmed it with a sheet of paper. The second was getting the BIOS settings right, since this motherboard doesn’t seem to do a very good job of autodetecting all the right settings. It only took a little bit of fiddling to get it up to the proper 2.83GHz speed, though.

It’s practically an entirely new machine at this point, with a new CPU (Q9550 quad-core, replacing an E6600 dual-core), more memory (from 2 to 6GB), bigger drives (1.25TB total), and a new video card (Radeon 4870). It is actually missing something, though — I yanked out the Audigy 2 sound card I’d been using before. Creative’s support for newer OSes with older cards has been a bit lacking, so I’ll take a chance on the onboard sound for now.

Now I just have to put some OSes back on it…

The Anticipation Is Killing Me

Woo, the new parts I ordered arrived today, much earlier than expected. But I can’t actually install them yet.

I only want to crack the case open once, so I have to install everything at the same time. But if I install the other hard drives, I’ll lose the OS and have to reinstall it. But Win7 isn’t ready yet for me, so I’d have to either reinstall Vista or restore the old install from the current drive, and there’s not much point in wasting time on that when I’d have to do a new install in a week now anyway.

So, for now the parts all sit on my kitchen table, taunting me, tempting me…

The End Of An Era

After talking with Shaw support, they’ve decided that it’s probably best if they just replace my cable modem with a newer one. They can’t really say if it’s the specific cause of my problems, but it’s a Motorola CyberSurfr, a positively ancient model at this point, and they’ve been trying to get people off of them anyway. I’ve been using this one for over ten years now, and the tech was surprised I’d still had one for that long, since most people experience problems and swap it out long before that point.

So, tomorrow I say goodbye to my old friend as I drop him off downtown. It may be old, but it served me well for an awfully long time.

Where Did 2008 Go?

With Windows 7 being released soon (less than two weeks away now, for us MSDN members), I figured it was time to consider upgrades for my main PC, so that I don’t have to mess with hardware changes post-install. Some upgrades are already essentially here — with the recent order of my backup drives, I’ve got a couple spare drives with plenty of space for games and apps, and I have another 4GB of RAM that I’d snuck into that same order.

I’d thought about upgrading the video card, but it felt a bit early since it wasn’t so long ago that I’d installed this one and my previous card lasted almost four years. But then I realized that, um, April 2007 was over two years ago, dumbass… It doesn’t feel like I’ve had it that long though, for some reason. The CPU is also starting to be a bottleneck in some cases too (I’m looking at you, GTA4), so it could use a bit of a bump as well.

I don’t really feel like going all-out with a completely new system though, especially since a Core i7 CPU would also require a new motherboard and expensive memory, so this is only an interim upgrade and the next one will be the big one. I’m going for good performance/price ratios rather than raw performance, so I finally settled on getting a Q9550 CPU and a Radeon 4870 video card. They should easily tide me over at least a couple more years.

I’m also going to try to add another 120mm fan into the case. The drives run a little bit warm, and these new parts aren’t going to make things any cooler…

Latency Killed The Video Star

As I briefly mentioned before, streaming video has been the main victim of my recent network problems. It’s been an interesting opportunity to examine just how the different services are handling it:

YouTube: Videos load more slowly than usual, and I can’t start watching them right away. Given enough time, though, it does eventually load the whole thing, so I just have to pause it and wait until a decent amount is preloaded. A.

Google Video: Likewise, it’s slow to load but eventually gets there, though a bit slower overall than YouTube, I think. It just suffers its usual usability and quality problems, being the abandoned orphan of Google’s video services. A-.

Viddler: The loading bar sometimes stops and gives up in the middle of a video, causing playback to stop when it gets there. You can get it started again by clicking near that spot on the bar, though, and skipping around like that is fairly robust in general, so there’s at least a workaround. B-.

Dailymotion: Unfortunately, the loading bar stops frequently here, and seeking around its progress bar isn’t nearly as robust. Trying to click outside the already-loaded areas usually just gets me a “There were technical problems, reload this page” error. In order to watch the video, I’d need the entire thing to load in one shot, and I failed to achieve that in what must have been at least a dozen tries on a short, 4 minute video. For not even letting me get to a significant chunk of the video, they get an F.

The Darkness Attacks

I just had my first ever compact fluorescent bulb burn out, after being installed for…I don’t even remember how long now, but it’s been years. I was confused at first because there was no flash or pop or anything; it just didn’t come on, and initially I thought there was something wrong with the switch.

It’s the first time I’ve had to change a light bulb at all in years now, after switching over to CF bulbs.

Lazy Packets

My Internet performance at home has had these occasional bizarre hiccups lately. In the above example, not a single packet in a string of 100 pings between me and the cable modem head end was lost, but just look at the latencies. There’s no physical-level problem with the data getting through, but the gateway’s holding on to packets for up to five seconds?! Good luck playing WoW under conditions like that…

Never Enough Space

After having bought a pair of 1TB drives for my new Linux box, I now have a set of three 1.5TB drives on the way to me. Damn, that’s a lot of storage.

I was actually waiting for 2TB drives to come down in price, but two 1.5TB drives together are still cheaper than a single 2TB drive. These ones will be used to complete the rest of my backup plan — right now I only have a 500GB external drive for my Linux box’s backups, and it’s 95% full. And that’s doing a straight mirror, without any room for daily differentials and rotating sets. Two drives will be used for that so I can keep one offsite, and the third drive will be for the Windows box’s backups.

Then, I can take the current backup drives and swap them into the gaming PC at the same time I upgrade to Windows 7, which should take me from 480GB to 1.2TB on there. Then I’ll never have to uninstall anything ever again…for a couple years, at least…

A Good Router Is Hard To Find

It’s a good thing I gave my mother my old router, because the new one hasn’t been working out as well as hoped. It works fine for a while, but then eventually it suddenly loses all of its settings, reverting to defaults and making me restore them from a backup. If it ever happens while I’m away, it’ll leave my wireless network completely open until I can get back and fix it.

It’s hard to tell whether it’s a problem with DD-WRT or with the router hardware, though. I’m leaning towards the latter, as apparently one possible cause of symptoms like this is if the flash memory goes bad, but it’s still hard to prove what the problem really is. And it’s not like Linksys will support a third-party firmware under their warranty, and I really don’t want to shell out another $130 just to test on another router that might well have the same problem.

For now I think I’ll revert back to the official firmware and see if it has trouble as well. Tomato was extremely reliable on my old router, but it doesn’t support this model.

My God, It’s Full Of Pixels

Dell had its regular end-of-quarter sale recently, and I couldn’t resist picking up their 2408WFP monitor. It’s normally fairly expensive, but at around 40% off with the sale, it was a better deal than a lot of plain old mid-level monitors. It also fulfills a few needs of mine, as not only is it bigger (24″ versus 20″), but it has HDCP support and an HDMI input, and also two DVI inputs and a set of component inputs. Too many of my consoles were languishing on sub-optimal inputs already.

I just got it and set it up today, and so far it’s just as good as I’d hoped. It’s about as big as I’d want for the distance I sit away from it since it already pretty much fills my view, the PS3 looks amazing on the HDMI input, I can put both PCs on separate DVI inputs, the 360 can get the VGA input to itself and not go through the KVM, and the Wii can finally use component instead of crummy old S-video.

The only caveat so far is that for the Wii, I have to set the monitor’s scaling mode to ‘Fill’ in order to get a proper widescreen display. But I don’t want it set to that for the PS3, or it scales the 1920×1080 mode up to 1920×1200, stretching things vertically a bit, so the PS3 has to be set to 1:1 or ‘Aspect’ mode. Making sure it’s on the right mode is a minor annoyance, but I can leave it on Aspect 99% of the time since I haven’t been using the Wii much lately anyway.

Edit: Hmmm, I can see some backlight bleed in the corners on the right-hand side when the screen is dark. I don’t think I’ll do anything about it, though; I’ve heard of people returning their screens six or seven times in a row before they got one that didn’t bleed at all, and it’s only really noticeable when the screen is completely dark, so it’s not really that big a deal.