Quickie: About 404’s You Are Getting Here
This is just a quick note to say that I know all about the broken links!
Starting this weekend I’ll be putting up all my various publications and setting up redirects to fix those 404’s.
Anything pointing at the old Googlepages or Drizzle or Wordpress hosted sites will continue to fail. I closed down those accounts. But using the search box should pick up anything you’re seeking that was formally on either of those sites.
Um, once I add a search box, that is.
On Build Integrity
One of the most important things any studio does is build the game. Compile tools and executables, preprocess resources, pack levels and so on. “Build integrity” is a term I use to cover a set of standards and processes intended to make sure the game build is consistent and reproducible. By this I mean that:
- Build output is independent of environment.
We get an identical build regardless of the machine, its environment, and the person doing the build. In “its environment” I include things like OS patch level, installed software and configuration, environment variables, and registry settings. - Build output is independent of time.
We can reproduce, exactly and simply, builds in any platform and configuration using source materials from any point in time. No special prep must be done to a system, such as downgrading installed software to run an old build. - Build output can be traced to its source.
We can reproduce any build given just a few pieces of information about precisely how it was built, which we can encode compactly into the executable and archive folder name. This lets us match up issues filed against a particular build with exactly what versions of source materials went into that build.
Most experienced studios are already doing this, but not all! At Oberon we worked with a lot of other studios and sometimes when we needed to build their game ourselves we’d hear great stuff like “we couldn’t find all the source code”, “this is an old version but it’s probably fine”, and “Bob’s machine did the builds but he got fired and nobody can find his machine”.
I know of one big pro studio that, upon shipping, takes their build server and sticks it in a closet in case they ever need to build the game again. Is that a safety measure? I’d rather trust a solid build process that is regularly in use on well-managed servers than some old machine gathering dust in the closet. Builds frequently end up rising from the dead! Someone always wants to port it to another platform, reissue it in a ‘classics’ with bugfixes, add integration with social networks, or pack it into an OEM build with new languages added. Or even send out a patch years later for paranoid things like Y2K compliance (remember that?).
Backup servers on ice or restored from an image taken at ship time may work, but we can do better.
Why All the Bother?
You could ask why we should bother with defining a lot of process and standards for the game build. Not that it’s a lot of work, though it could be a major change in mindset if you’re used to just having Bob copy builds up from his machine.
Build integrity is important because we practice science in diagnosing and resolving issues in the game. Detached observation, hypotheses, evidence, elimination of variables, reproducibility, all of that. On modern games on platforms with many processors running simultaneously, and a lot of nondeterministic input from multiple players and the network, we can’t afford to mess around with guesses. We must be able to trust the evidence that we base our assumptions on when we start investigating.
The alternative is madness. Seriously, madness. You don’t want to be near someone who has been debugging some insane problem for two days only to find out that they’ve been working with the wrong build. Whether it’s from a bad build number put into the bug report, or the build having been done by someone with hacked local changes, or consistency problems in the depot, this can really turn a normally nice, mellow person into a rage machine. After hours of carefully eliminating all possible variables and laboriously stepping through optimized assembly, having to start completely over is unbelievably frustrating. Most bugs obviously aren’t like this, but the ones that are always seem to show up at the end of a project and threaten ship dates and marriages.
And that’s just the start. There are a lot more things that can suffer from a lack of build integrity, which I’ll be covering along the way in the next post. The bottom line is that without evidence we can trust, we can end up wasting a lot of time in engineering and testing. We want to set up our build process and practices in a way that ensures high build integrity. The build is the most important output of the studio (besides paychecks of course) and it must be protected.
This goes beyond simply archiving the executable and debug images, or having a dedicated build server. It’s a way of life. Luckily, it’s also pretty easy to set up, and has lots of side benefits such as accelerating setting up a new team member or project. The build platform includes the development platform. More on that in the next post.
Peer Code Reviews: How Did We Do?
This is the sixth in a series of posts on our peer code review process at Loose Cannon Studios. Here’s the full series so far:
- Peer Code Reviews: the “what” and “why” of code reviews for our studio
- First Attempts: our version 1 process that didn’t go so well
- Success: the final process and tools that we ended up with
- Good Commenting Practices: some best practices for making review comments
- About Our Crucible-Perforce Bridge: about our tool that creates Crucible reviews from P4V/Win
Time to analyze how the process worked out. Did we get the benefits we expected? Did we run into the problems we predicted we would? What new things came up that surprised us? Or did it all just devolve into a nit-picky passive-aggressive waste of time?
I’ll compare the last twelve months, where we were doing reviews as per my “success” post, with the six months before it, where almost no reviews were being done and it was the wild west as in the “first attempts”.
Code Reviews Strongly Recommended
Given the number of posts I’m doing on this subject, it should be obvious that I consider our peer code review process to be one of the most vital and successful things we’re doing at Loose Cannon. In fact, I’ve come to believe that code reviews should be a part of every studio’s process.
Not the way we’re doing it, necessarily. Our process was the result of planning by senior team members plus a lot of ongoing feedback from the team to keep it relevant and effective. So it’s necessarily specific to our culture, personalities, and projects. Another studio may need different tools. Or they may ditch the primary reviewer group, or go with pair programming instead.
But I think every studio needs some kind of code review process. I’d do it even with a team size of two.
Revisiting Our Goals
First things first. Let’s revisit our original goals and see if we hit them. Did code reviewing live up to its promises? The short answer is a definite “yes”. Here’s the list from the first post:
- Share Knowledge
- Catch And Correct System Misuse
- Raise The General Quality Of Code
- Mentor Junior Engineers
- Educate About And Enforce Standards
In all of these areas, we had significant improvement.
Share Knowledge
This was a clear win, but there are tradeoffs that highlight the differences between using a real-world conversation and a virtual-world tool. You could say that, ideally, sharing knowledge is best done face-to-face. In person, you get a tight feedback loop that quickly cuts off irrelevant discussion. And obviously speaking and listening are a lot faster than typing and reading, so you can cover a lot more ground.
Yet I already covered how, in our first code review attempts, face-to-face reviews had logistical issues. But more importantly, in the context of knowledge sharing, the spoken word doesn’t get a permalink. None of it gets captured. It can’t be referenced nor linked. It can’t be adapted into a wiki page. It has no recorded history and can’t be searched. It’s inherently private, not public, so we have a lot of duplication, on top of the warping that happens when it becomes the “telephone game”. It gets more difficult and time-consuming the more people that are involved. Lots of problems.
Once the conversation is over, it can only live on as tribal knowledge.
With a code review tool, every comment gets a permalink and is publicly viewable and searchable. Everything gets captured by design. And it’s captured directly in context, right with the lines of code that are being discussed.
I saw many examples of knowledge sharing in our comments. Most of it was casual – throwing out a bit of info about how a function works or what a particular order of calls should be. I gave some examples of these in my post on Good Commenting Practices. But often it was more structured. The reviewer knew that there was an opportunity to explain in detail some nuanced part of a system and took the time to write a fuller description. A frequent reply I saw was “oh! I didn’t know you could do that!” Cool.
Reviewees often pre-commented their reviews with questions. Even though they already had a working, tested change, they wanted to know more about the systems they were using. This is great – so often I see engineers not caring much about how a system works, just wanting to hit “.” and flip through the Intellisense list to look for a relevant function to call. In a review they’re able to ask questions when it’s already on their minds, in context. This happens much less often in casual conversation!
Sharing went the other way as well. As I reviewed systems under development I learned more about how they worked. Through the small snapshot of a single change I wouldn’t usually understand much. But over time, as I watched things evolving, and reviewed different parts, I eventually got a pretty good idea of how parts of the game worked that I normally would have little contact with. I also learned how different engineers tended to do things, and learned some new techniques along the way. When it came time for me to go into any given system to help debug it at the end of the project, I usually had some familiarity with it and knew what I was getting into.
Catch And Correct System Misuse
This is a special case of Share Knowledge, and was clearly improved with the new process. In particular, the ability to easily add multiple extra reviewers for their domain-specific knowledge was a key advantage. There are many examples in our review database of a domain expert chiming in on something like where to put new system-wide Nant properties, or how a particular change to a state machine isn’t taking into account some dependency. People also got comfortable bringing in new reviewers with domain expertise and directing them to some extra-grognardy part of their change for a focused review.
It was also a good way to detect where we had bad API design. This was something that seems obvious but I wasn’t expecting. If people are continuing to get confused and doing the wrong thing across multiple reviews when using a given system, it means there’s something very wrong with it.
For example, a couple years ago I wrote a set of four array classes, each meant to be used in a very specific way. After doing a lot of reviews involving usage of these classes, it seems that most people still don’t understand the difference among them despite heavy documentation and regular reminders in reviews. Time to rewrite. I get a much stronger message that way than I would hear in casual conversation.
Raise The General Quality Of Code
This is an area I saw huge improvement. It really was the wild west before we started doing reviews again. Everybody had a different style and used different patterns, and the code looked like a mess. Parts are still a mess now, of course, but I feel like we’re mostly rowing the boat in the same general direction.
Further than just stylistic improvement, the overall quality of changes went up. I seriously think this is just the peer pressure at work here. If you know that someone is going to be seriously looking over your code and not just complaining when it crashes the game, you will take more time to get it right before considering it ready to go. As a user of code, I’d normally only care about the public interface to it and its performance characteristics. As long as it is self documenting and works as it should, I shouldn’t care a whole lot what’s going on inside. But as a reviewer, I’m looking deep inside the implementation, and the reviewee knows it.
This also lets me more easily become a maintainer of the code in the future if necessary, like if the owner goes on vacation. Towards the end of a project, we all tend to jump from system to system, fixing all kinds of random bugs wherever we can help out. Being suddenly dumped into working on someone’s crazy implementation code that I previously hadn’t been caring about is frustrating. It’s so much better to have been familiar with it already through regular reviews.
Mentor Junior Engineers
I mentioned earlier in this article that knowledge sharing may be done best face-to-face, but in the case of mentoring junior engineers I think the written form is probably better. At minimum a set of written comments should be the start of a conversation. It lets you clearly define what the new technique is that you’re trying to show to them, and it’s recorded so they (or anybody else) can refer back to it whenever they need. Having it in text form lets the junior engineer really take their time in absorbing the knowledge and thinking about it. They can try out the unfamiliar technique and get to know it a little before responding to find out more information or perhaps argue about it.
Speaking of which, being written makes it easier for engineers to challenge their mentors in a less confrontational way. As we all have learned with the Internet, people are willing to say things in text that they wouldn’t in person. So if someone doesn’t buy an argument, it’s a lot easier to state that as a reply to a comment in a review than by directly challenging them on it. I often prefer the direct-challenge approach myself, but lots of folks aren’t like this. You could expect this to lead to passive-aggressive discussions but from what I’ve seen so far in our reviews this hasn’t been a significant issue.
Educate About And Enforce Standards
Given that we didn’t have coding standards until the new review process was in place, I can’t compare this with the pre-review period. But with the new process, I noticed a few things:
- New code written by different people started to all look similar. People were adhering to standards. This was awesome. During the wild west time, we had at least four drastically different styles in play.
- New engineers who came on board and had trouble adjusting did it a lot quicker than I’ve seen in the past due to the regular reminders they got in reviews.
- Weak or ambiguous points in the coding standards were found very quickly. Reviewees often pointed these out, which then got escalated and either resolved or postponed for later discussion.
- Areas where we needed to decide on new standards were found quickly. And those new rules got out to the rest of the team quicker and more effectively through reviews than just by email.
We also realized that we needed best practices on top of basic coding standards. What’s the standard naming for resource allocation/release functions? When and how is it appropriate to assert state? How should we lay out platform-common and platform-specific code for a given class or system? Recently, this has expanded into big discussions about our overall engineering design process. I plan to post on this topic in the future as we figure things out.
Comments related to standards compliance varied a lot among reviews and reviewers. Some reviewers spotted violations right away, and others will miss 95% of them. It didn’t matter when things were missed. Eventually, with enough reviews, people would start coding to the standard. I think the mix of attention levels helped in keeping things from getting too nit-picky.
Towards the end of the project we altered our process to just stop bothering with comments about coding standards. None of that code (usually certification-related) was going to be brought forward to the next game and we wanted to really focus on getting the game shipped.
Version 1 Problems Fixed?
Back in the first attempt, I identified three critical problems that contributed to the death of the original process. So how did we fare on these in the final process?
Mini-Cliques
This was primarily a physical location issue that disappeared when we switched to using Crucible and started working mostly in the virtual world.
At one point, though, one of the primary reviewers noticed that they were never getting reviews from a couple engineers on the team. These engineers were overriding the random automatic choice of crucreate and always going with the same reviewer. In both cases, it was a misunderstanding that was easily resolved.
I ran some stats to see what the spread was for each engineer and everybody else was ok, spreading their reviews out pretty well. But it did bring up some enhancements I’d like to make to crucreate’s chooser dialog when showing potential reviewers. Just a little more info to help the random chooser and give more information to the reviewee:
- What percent of your reviews has each reviewer been on in the last 30 days?
To prevent favoring a reviewer too much by accident. - What is the current review load of each reviewer?
To prevent overloading a reviewer who already has a big queue. - Is the reviewer active at their workstation?
To prevent people forgetting when someone’s taken vacation or is out to a doctor appointment or whatever. We can detect this from their idle status on the Jabber server.
It would be fun to implement those features, but it’s admittedly low priority stuff.
Lack of Simultaneous Availability
This is another physical issue that was mostly solved by doing reviews in the virtual world. I say “mostly” because obviously it doesn’t change the situation when you need to ask questions and discuss a change in person.
In fact, it’s sometimes worse in these cases. Unless someone is in a rush to check in, they usually don’t pay any attention to a reviewer being available at that moment. There’s no need, and furthermore it’s a major benefit of the process. So nobody’s thinking about timing all that much.
The problem comes up when a reviewer needs to discuss a change in person, and the reviewee just took off. Now you have to do the scheduling time thing to get together. It’s not a big deal, all that happens is the review is delayed. But as a reviewer it’s always frustrating when I’m down in the middle of a big review, and have a small question, and the reviewee is nowhere to be found.
I’ve dealt with this a couple ways. As a reviewer, I can spend a minute skimming the review to get an idea of its complexity. If it’s going to be a tough one, I can check on the availability of the reviewee before digging in over IM. “This is going to be a tricky review – you around for the next hour or so in case I need to ask you anything?” Or as a reviewee, I usually know in advance if a review will need discussion or not. If it will, I’ll avoid sending it out if I’m about to leave the office. Or I’ll add a general comment at the top saying “this might be tricky, we may need to discuss before you get too deep, let me know” kind of thing.
The Blind Leading the Blind
We handled this in a couple of ways that I think worked: the primary reviewers group, and adding multiple reviewers.
Requiring a primary reviewer means you’re guaranteed a more senior person. And the ability to have multiple reviewers working on a review simultaneously means it’s easy to add in people when you don’t know what you’re doing in a particular area. This helps out significantly with the blind leading the blind.
There’s also accountability with the system, so you can trace back problems and get them resolved. With in-person reviews the entire conversation is lost due to being all verbal. But with our process, every change has an associated review. It’s easy enough to find out who reviewed a change that caused big problems and “review the reviewer”.
Special Thanks
I want to give some extra props to Atlassian. It took us many months to really settle on Crucible as our tool of choice and then pay for it and get a real license. We had a few false starts and it took a while to give it a real shot. During this time, Atlassian had no problem continuing to renew our evaluation licenses (as they did previously with Jira and Confluence).
This was great because it let us get the experimentation and testing we needed, not to mention the budget. 30 days is a really short time to check out a workflow tool, particularly one of this magnitude for us. I wish everybody had such a liberal policy with their evals. Not to mention using a simple license key and not requiring some obnoxious DRM licensing server or annoying things like IP address locking like so much other software we use. So hats off to them.
