Making the best of a bad situation: Lessons from an Intercom outage

Date:

Share:

Imagine starting your day with a page about elevated exceptions, diving into Datadog, and uncovering a lurking 32-bit integer limit in one of the most critical parts of your app’s data model.

Well, that’s what happened during a particularly chaotic, and unusual, outage for Intercom. What followed was a five-hour marathon incident response involving monkey patches, migrations, and feature flag gymnastics – but despite the stress, it was certainly educational.

“Learn what went wrong, how we fixed it, and the lessons we learned to prevent it from happening again”

There was inevitably lots to learn, and I share the key lessons in this talk at Rails World 2024.

You can hear the highlights (and lowlights) of that incident, including what went wrong, how we fixed it, and the lessons we learned to prevent it from happening again.

I also dive into some of the technical details we implemented after the fact to make our Rails app more resilient.

If you’re curious about how to avoid similar pitfalls – or just enjoy tales of debugging under enormous pressure – check out the video.

Off Script Ep7 - Blog Vertical Ad - POST LAUNCH

Source link

Subscribe to our magazine

━ more like this

William Sealy Gosset Plaque in Ireland

When William Sealy Gosset joined the Guinness brewery in Dublin in 1899 as a chemist, he faced a practical problem: how could he...

Hiring manager demands candidate send a “thank-you video” to the team after their interview, candidate refuses: ‘That’s one way to filter out people with...

While interviewing for a new job, one hears a lot of strange requests that they don't quite know how to deal with.We've heard of...

Kendall Jenner’s $2 White Tank Top Review With Photos

While each product featured is independently selected by our editors, we may include paid promotion. If you buy something through our links, we may...