Making the best of a bad situation: Lessons from an Intercom outage

Date:

Share:

Imagine starting your day with a page about elevated exceptions, diving into Datadog, and uncovering a lurking 32-bit integer limit in one of the most critical parts of your app’s data model.

Well, that’s what happened during a particularly chaotic, and unusual, outage for Intercom. What followed was a five-hour marathon incident response involving monkey patches, migrations, and feature flag gymnastics – but despite the stress, it was certainly educational.

“Learn what went wrong, how we fixed it, and the lessons we learned to prevent it from happening again”

There was inevitably lots to learn, and I share the key lessons in this talk at Rails World 2024.

You can hear the highlights (and lowlights) of that incident, including what went wrong, how we fixed it, and the lessons we learned to prevent it from happening again.

I also dive into some of the technical details we implemented after the fact to make our Rails app more resilient.

If you’re curious about how to avoid similar pitfalls – or just enjoy tales of debugging under enormous pressure – check out the video.

Off Script Ep7 - Blog Vertical Ad - POST LAUNCH

Source link

Subscribe to our magazine

━ more like this

Necior/awesome-jj: A curated list of awesome Jujutsu things

Jujutsu (also known as jj) is a Git-compatible version control system. This repo contains a curated list of awesome articles, tutorials, videos, tools or other resources that...

A Better Way to Stay Connected When You Travel

When you’re traveling, staying connected is essential. Whether you’re figuring out the best route to your hostel, finding a place to eat, or translating...

The Checkerboard – 99% Invisible

In 2019, hunters Brad Cape and Phil Yeomans...

Jim Carrey Wanted to Become the Grinch at All Costs

Ron Howard’s live-action How the Grinch Stole Christmas reaches...