The Success of the GitLab Failure

You might have heard that GitLab had an incident a couple weeks ago. They lost six hours’ worth of data when an employee (a sysadmin) deleted what he thought was an empty file. Then, after the delete, GitLab discovered that all five of their recovery systems had either failed or weren’t working in the first place. Two weeks out, you can find the facts of the situation on most tech blogs, along with suggestions for how to avoid your own GitLab disaster in the future.

There are a lot of things GitLab could have done better. They could have had a working backup system, for instance. But it would be a waste if the situation ends up in textbooks only as a classic “what not to do” example. The value to be learned from the GitLab disaster is in the quality of their response to a serious situation. So go check your backup systems, but then come back, because there is more to learn.

1. Don’t kill the messenger

If your first thought was “that sysadmin is fired,” you are not alone. It just makes sense, right? You delete your company, you get fired. But GitLab didn’t fire the sysadmin. They said everyone makes mistakes.

If nothing else, that should interrupt our well-trained thought process. In most workplace cultures, we talk a lot about encouraging risk-taking and mistake-making as a valuable way to learn. But how often is that actually practiced? Maybe it’s time to take stock of our mistake-response practices, stop thinking of mistakes as a luxury, and start looking at them as an integral part of the growth process.

Did sysadmin make a mistake? Of course. You probably shouldn’t force delete a folder without double checking first. But if you take a step back, you could say that sysadmin became an unplanned GitLab software tester for about 3 minutes—he found a vital flaw that needed to be addressed. Sysadmin deleted a folder. He didn’t cause all five backup systems to fail. GitLab’s response was one companies often only theorize about: don’t punish mistakes, use them.

2. Transparency is better than excuses or silence

When GitLab went down, they immediately tweeted about it. Then they opened a Google doc where they shared live problem-solving as they worked through responses to the incident. They shared that doc with anyone who wanted to look at it. How do we know a sysadmin deleted a folder? Because GitLab told us. They also told us their backups were failing or not turned on. They were totally upfront and honest about the situation as it was happening, and have continued to be after the fact.

They took flack for it. Some said they were too transparent. You can debate the merits of that argument. But here’s a question: on a gut level, do you think more or less highly of GitLab at this point, knowing what happened and how they responded? They didn’t have to keep their users up-to-date on their progress, but they did. They were honest, up front, and clear. They didn’t sugar coat the issue, and they didn’t shy away from admitting their failures.

This is in direct contrast to the general political scene. And you don’t have to look very far to see the difference transparency makes. It’s a big deal. It builds trust and loyalty, and it precludes awkward backtracking conversations later. It does come with risks, of course. If you tell people what you did wrong, they will probably be mad. But they’ll also respect you for being honest. They’ll probably forgive you, too. Because in the end, even when people are mad, they would still rather hear the truth.

“These things happen, and we try to deal with them as professionally as we can. This means being honest and direct. We can be proud of our abilities, but we must be honest about our shortcomings—our ignorance as well as our mistakes.” – The Pragmatic Programmer

3. Learn from mistakes

This sounds obvious, because it is. The problem is that what’s obvious often gets overlooked. How often do we make mistakes and then shove the whole mess under the rug, hoping no one will notice the lump it makes? It’s a high ideal to learn from our mistakes, but it’s a much messier thing to actually do the learning. It involves meetings and postmortems and serious work to fix things. It requires creativity and admitting we might be wrong, and sometimes it means totally scrapping beloved processes in favor of new ones.

The good thing is that if you’ve been transparent about your mistakes, you have a built-in accountability system for fixing them—because everyone will notice if you don’t. It’s a painful process, no one will deny that. We all have a bit of the perfectionist in us that wants to deny ever making a mistake. That can kill a company. If we never admit a mistake, we end up with piles of problems so big the rug can’t cover it anymore. In software development we call that “software rot.” In life we just call it a bad idea.

On the flip side, if we admit mistakes, then dig in and figure out how to do better next time, well, that’s how companies grow. Clean up the mess, straighten the rug, and figure out why the thing broke in the first place so we can make sure it doesn’t break again. It’s called responsibility, and it’s as effective as it sounds.

Learn from GitLab’s mistakes—don’t delete folders without making sure you know what’s in them, and check your backup systems before disaster strikes. But learn from GitLab’s successes, too. Treat your employees with some empathy, and don’t be afraid to fail, and be honest about it when you do. Making mistakes means you’re also making progress.

The Success of the GitLab Failure

1. Don’t kill the messenger

2. Transparency is better than excuses or silence

3. Learn from mistakes

Submit a Comment Cancel reply

Recent Posts

Recent Comments