New Fun Blog – Scott Bilas

Take what you want, and leave the rest (just like your salad bar).

Back Up (Some Of) That Local Data!

with one comment

I previously posted about using the local workstation as a data store. I concluded that local storage for some important data is an inevitability, and that’s ok as long as it’s backed up.

And I have yet to work at a studio where more than a few workstations had a formal local backup process in place. I believe this is mostly due to how rarely we lose work to catastrophic failures. There’s usually plenty of advance warning from a hard drive before it barfs, giving you time to move over to a new drive. If it ain’t broke often, no reason to fix it in advance. Even in the case of a total sudden failure, it only takes a day or two of re-imaging the OS and reinstalling apps to get back to being productive.

I tend to agree. I don’t think the occasional hard drive failure is worth a general studio-wide full workstation backup policy. It’s just too much work, storage, and time, and is going to cost far more than it will save. Instead, let’s have a more targeted approach. For most people in the studio, I prefer one or more of the options I’m going to enumerate below.

Local-Based Backups

This is the method I use. Even though I just said it’s not a huge deal to have a total failure, some people (like me) are ultra paranoid about losing even the smallest amount of work or time. And for that, nothing beats a local full-drive USB backup.

How?

Big fat USB drives are cheap. You can trade size and price for the performance you don’t need. I’d also get one around twice the size of the main workstation drive array if possible. That way two or three backup rotations plus incrementals can be kept going at once (assuming backup compression).

For software I’d get something that uses the shadow volume service to do copies so it can pick up the in-use files. I particularly like Ghost even though I loathe most things Symantec makes. It’s important to set up frequent (two to four times a day) incremental backups. Set them to low priority to avoid bothering you while you work. And set up automatic replacement of old backups when the drive is full. The system has to be fully automated and easy to use so you can forget about it until you need it when something goes wrong. Ghost even has a nice feature to let you use Google Desktop to search your various backups, a little reminiscent of Apple’s Time Machine.

I’d pick this over a mirrored RAID setup any day because you get incremental backups, and you get an easy way to restore data if your workstation dies and you need to move to a new machine quickly.

But Why?

Why am I so paranoid about this? A couple reasons. The less important one actually is the total loss issue that I mentioned earlier. The main reason I have frequent incremental full machine backups going is that I frequently make incremental boneheaded mistakes in my work.

The most common scenario is where I’m working on some small task, and in the middle it explodes into a giant task. Before I know it I’m four days in with 50 files checked out and crap I just accidentally reverted a bunch because I thought I was working on my other client for that quick bug fix someone needed. Or a day later I confidently go down some new mental path and redo a bunch of work without realizing that this is the wrong way, and now I need to undo back to the previous day. This is especially useful when coding a little drunk after lunch. But if the incremental is only half a day old at worst, you’re not in too much trouble.

This problem is so common for me that I’ve started building some tools to fix it on top of the incremental backup. Right now I’m eyeing git for ideas because it has features like stash and bisect. I’m also considering adding on a continuous backup system (like Norton GoBack or perhaps I’ll mess with Previous Versions a bit more). Like I said, I’m paranoid.

Incidentally, this would be a lot less of a problem for me if P4 had a local repository concept like git and mercurial and other more more modern source control systems have. Then I could check in as often as I wanted for backups and revision history as I worked, and get all my favorite source control tools to help develop.

The P4 way to fake local repositories is to do private branches, but on a large project it’s surprisingly time-consuming to manage. Perforce does not provide any tools to help out with this so I’ll have to roll my own here as well. I’ll probably write about this in a future post. It’s getting increasingly frustrating working with P4. Especially considering its obnoxious price and how much time I’ve put into building my own tools for it.

Avoid the Network

For full-drive backups, I’d stick with a nice simple local USB drive instead of a fancy server-based backup over the network. Doing it from a master server means you have to deal with the IT department every time you want a file back, and the network will get saturated at unpredictable times. This a lot of the benefit of using this system in the first place.

Now, the IT department should at least remotely monitor all of the workstations using this method to make sure that backups are working properly. Someone could easily have a USB hub failure or accidentally kick the connector out and not necessarily notice that backups were failing. People can’t be trusted to pay attention to balloon popups in the tray, they’ll just ignore them forever!

But like I said, this method is for the truly paranoid. Not many of those on the team, and so most can use the next method: server-based backups.

Server-Based Backups

Ok so you’ve got a tape backup and you’re not afraid to use it. You want to roll some local data into the server based backup. But you don’t want to overwhelm the network and your tape storage with enormous full drive backups. Here are some other options. They aren’t mutually exclusive, either.

User Folders on a Server Share

Give each user a public and/or private folder on the server (where the private folder has permissions only for them), and map it to a local drive or folder using a domain login script, perhaps P: for public and X: for private. Tell people to keep, copy, or sync files there that they want backed up. Simple, and easy to manage.

There are a couple big problems with this system. First is that it puts the burden on the user. They have to manage their files that are going to be backed up or not. So people may forget or not want to bother with this. Or once they start they may not want to keep it up and then you have outdated files sitting in the store (though this can have its own slight advantages). The other problem is that people tend to get lazy and copy large folder trees over (temporary .obj’s and so on included) and not worry about size or duplication much.

Ultimately certain types of users will use the system effectively for local workstation backup and others will not. It’s partly a matter of education but also a reason why (ideally) we want to provide multiple options. But overall this is the main (and usually only) method of local data backup I’ve seen in use at studios I’ve worked at.

Remapped User Folders to the Server

This is a variation on the public/private folder idea. Windows lets you remap local folders (such as Documents, Images, Contacts, etc.) to a remote server share. You can just right-click on the folder, go to Properties, and change the folder’s Location to point wherever you want, like a private share on the server, set up per-user.

This is the method we use at home, actually. Ally mapped her Documents folder to our ReadyNAS in the closet. She doesn’t have to do anything special, and her stuff is mirrored, snapshotted every two hours, and backed up offsite. Works great. You can also manually set up NTFS junction points if the OS shell doesn’t permit remapping a particular folder (such as the desktop).

The down side of course is that all these files are mapped directly over the network. On our home network, which is very fast over the wire and only has two people using it, it’s not noticeable. But at a game studio with typical minimal IT investment and lots of people simultaneously working with large files, this could be a big problem. Another possible problem is that if the server goes down, people can’t work with these files locally because copies are not kept locally.

Offline files are a potential option here that I haven’t explored much. I haven’t heard good things about the sync performance, but it’s only anecdotal. Also, it sounds like you need Vista to make it work well, and our industry is really dragging its feet on moving to Vista from XP (sounds like we’re all just going to skip it and jump straight to Windows 7). Other options like rsync are available but we might as well use one of the other options below instead.

Server Pull of Specific Folders

This option is where the backup server periodically reaches into each workstation and backs up files from a standard set of folders. Perhaps the Desktop and Documents folders, or maybe from a special “Backed Up” folder kept on the desktop of C:\ of each machine (or all of the above). With this option people continue to work with local files so performance for them is high.

As with the Remapped User Folders option, educate the team about which folders on their machine get backed up. Then they will know that anything they copy to the desktop or save to their documents folder or whatever (which is the default on so many programs) will be safe.

Now, the big downside with this option is that the server has to back up a lot of files through a relatively slow network. It’s more targeted than a full drive backup but still is going to be quite a lot of data. With all of the other options, where the primary data is kept on the server, the overall network cost is minimal because users are accessing files on demand. The server can back up things locally through ultra high speed links in the server room. But with server-pull from workstations, the server must access every file that is a candidate for backup over the studio’s more ordinary network, which takes a lot longer.

Also, it must deal with individual workstation problems. Backups are typically a serial process, so if one workstation is having issues (perhaps it’s running super slow due to an overnight render or batch process it’s doing) then the whole system is bottlenecked.

Exclusions

With all of the above methods, we will have a big problem with users dumping things in their Documents folder that they really don’t need backing up such as movie trailers, music, game demos and so on. You can nag people but it’s easier to tell the server to exclude large junk file types like exe, pdb, mov, avi, mp3, m4a, iso, and so on from any of the local workstation data it backs up.

Roaming profiles

A quick note on roaming profiles. Don’t use them.

Roaming profiles keep the user’s profile on the server. This includes their entire Users\Username folder minus local-only settings and temp and cache files. The profile is synced with the server on user login and logout. We used these at Loose Cannon before I pushed hard to get them killed.

Roaming profiles are interesting in theory, but in practice, they are a terrible, awful idea. I admit, it’s tempting to have a system that shares all your settings so that you can log into any machine on the domain and have the same setup. You can log into Bob’s machine and all your Visual Studio keyboard shortcuts come with you. Doesn’t that sound nice? But the reality is that sync is ultra slow and makes logging in/out or rebooting frustrating beyond belief. Why is it slow? Well, profiles tend to get really amazingly huge. Everything goes into the user’s profile. And if it hurts to log out/in then it will be even harder to get people to keep their systems patched, which is already a big problem.

Roaming profiles are bad enough with the latest versions of Vista and Windows Server. But they’re even worse on a Linux-managed “domain” (Samba), which is what we had way at Loose Cannon way back. Any bugs or compatibility problems on the way the sync happens get propagated down to your local machine, potentially damaging your profile forever. Our Linux-based server was apparently incompatible with Vista and I got my roaming profile horribly mangled a couple of times until I just forced it to go local. Since then, we’ve been on a nice, easy to admin Windows Server with local profiles only and haven’t looked back.

We may have been able to get roaming profiles working right after a lot of work and tuning, but is it really worth the trouble? Is it really all that useful to log into someone else’s machine and have all your settings be the same? Nope! Depends on some studio workflow policies (pair programming, standards on supported editors, etc.) but I still say no! Windows is simply not any good at this, and gets worse at it every year. Perhaps with a massive investment of custom tools, but…bah. Better things to work on.

Ok let’s move on. Next up: back to the series. Data Store 2: Server Shares.

Others in Series

January 6th, 2009 at 11:57 pm

One Response to 'Back Up (Some Of) That Local Data!'

Subscribe to comments with RSS or TrackBack to 'Back Up (Some Of) That Local Data!'.

  1. The way I designed a system like this to work for Surreal was to have a little C# app that would run daily (at about 2am).

    This would parse out a config file, which would tell the server which machines to backup, what file types it was allowed to backup, and where the backups should go.

    It would then go out to each machine (with admin privs) and access the LanMan C drive share on each machine (eg. \\dev-scooke\C$) – of course, having admin privs, it could poke around as much as it liked.

    On each machine in the root of C was another config file, specifying what the user wanted to copy (ie. which folders), and whether they wanted an email on success, on failure, and what their email address was.

    The server would parse this file, and then it would use Robocopy in backup mode to backup their files in low-network utilization mode. It would then RAR all those files up to a RAR file.

    It’d also keep historical backups for a few days. Worked quite well – except when people had immensely nested files and explorer would have issues reading folders with filename & path combos with more than 256 characters. (Which is weird, because NTFS supported it).

    Simon Cooke

    12 Jan 09 at 11:42 am

Leave a Reply