Degraded access to admin panel

Postmortem

April 17, 2026 at 1:28 PM

Postmortem

April 17, 2026 at 1:28 PM

What happened
On April 8, about 1.5% of admin panel traffic started failing. API endpoints were not affected. The issue was fully resolved on April 9.

Why it happened
We recently rolled out a new version of the software that routes user traffic to our services (HAProxy) across our fleet of six load balancers. The rollout had been tested in a test deployment and initially looked healthy in production as well. However, about 7 hours after the upgrade, one of the six load balancers started failing to reach backend services — and that's what caused the errors users experienced.

The root cause is a bug in the new HAProxy version that only shows up under a very specific, still-unidentified set of conditions. Our test deployment and the other five production balancers kept working fine, which is why the issue slipped past our pre-rollout checks.

Why it took us a while to notice
When a request failed, our system automatically tried it again, and the retry usually succeeded. So from the outside things looked a bit slow or occasionally flaky rather than clearly broken. That automatic retry behavior is normally a good thing — it hides small, transient glitches from users — but in this case it also hid the growing problem from our monitoring long enough to delay detection.

What we did
- Rolled back the affected balancer to the previous HAProxy version, which immediately restored normal traffic.
- Paused all further HAProxy upgrades across the fleet until we understand the trigger.

What's next
We're working to reproduce the bug in a controlled test environment so we can pinpoint the trigger, confirm a fix (either a patched version or a config change), and safely resume the rollout.

Resolved

April 09, 2026 at 4:28 PM

Resolved

April 09, 2026 at 4:28 PM

This incident has been resolved. We'll get back with details at a later date.

Investigating

April 08, 2026 at 3:37 PM

Investigating

April 08, 2026 at 3:37 PM

We are currently investigating elevated error rates affecting approximately 1.5% of traffic to some internal web services. API endpoints are operating normally and are not affected by this incident.

maestra - Degraded access to admin panel – Incident details

All systems operational