So, while my drive recovery grinds away (current ETA: 8.5 days) and I wait for the arrival of new drives, I need to step back and rethink how I do my data storage and backup to begin with.
Previously, I thought I had a fairly decent setup: one big storage drive that did automatic weekly differential backups via ‘dar’ to an external USB drive, rotating and starting a new full backup once a month while still keeping one previous backup set around, and every once in a while I would physically swap the external drive with an offsite one, for disaster recovery.
That sounds alright, and it worked well for a while, but the problem was that I let the backup solution fall behind my main storage needs, and started compromising on things. As the amount of data I had grew, I started letting it run less often so that the backup drive wouldn’t fill up so quickly. Soon there just wasn’t room for both the previous and current backup sets, so I got rid of the previous one. I had to do full backups from scratch more often because there just wasn’t much room left for the differentials. I got lazier about how frequently I did the offsite drive swap. And eventually, I’d upgraded my storage drive to 4TB but the backup drive was still a mere 1.5TB, so I had to start excluding stuff like the ripped DVDs from the backup because they just wouldn’t fit. This all left my backups in a more fragile state than I’d realized, leading to my current hassles.
Once this is all cleaned up, I obviously need something better. Something simpler, less prone to error and laziness and compromise.
Right now, my basic line of thought is:
- Still have one big internal storage drive, an external backup drive, and an offsite replica.
- The backup drives must always be at least as large as the storage drive.
- Instead of differentials, the backup drive is a straight mirror of the storage drive.
- The mirroring is done automatically on a schedule via ‘rsync’, instead of my horribly convoluted wrapper script around ‘dar’.
This should avoid most of the aforementioned problems except perhaps the offsite swapping laziness, which I’m just going to have to do better at. The data is guaranteed to fit, there’s no need to compromise or exclude anything, the syncing can be done automatically safely, and in the event of drive failure, the backup drive can swap straight in as the storage drive. The downside is that plain mirroring will also mirror any mistakes I make if I don’t realize it and catch them before the next sync occurs, but hopefully most mistakes will be heat-of-the-moment ones I can restore immediately, and in the worst case I can still resort to the offsite drive for an older copy.