maestra - Notice history

All systems operational

Admin panel - Operational

100% - uptime
Mar 2026 · 100.0%Apr · 100.0%May · 100.0%
Mar 2026
Apr 2026
May 2026

Site personalization - Operational

100% - uptime
Mar 2026 · 100.0%Apr · 100.0%May · 100.0%
Mar 2026
Apr 2026
May 2026

Loyalty - Operational

100% - uptime
Mar 2026 · 100.0%Apr · 100.0%May · 100.0%
Mar 2026
Apr 2026
May 2026

Mass campaigns - Operational

100% - uptime
Mar 2026 · 100.0%Apr · 100.0%May · 100.0%
Mar 2026
Apr 2026
May 2026

Transactional messages - Operational

100% - uptime
Mar 2026 · 100.0%Apr · 100.0%May · 100.0%
Mar 2026
Apr 2026
May 2026

Notice history

May 2026

No notices reported this month

Apr 2026

Degraded access to admin panel
  • Postmortem
    Postmortem

    What happened
    On April 8, about 1.5% of admin panel traffic started failing. API endpoints were not affected. The issue was fully resolved on April 9.

    Why it happened
    We recently rolled out a new version of the software that routes user traffic to our services (HAProxy) across our fleet of six load balancers. The rollout had been tested in a test deployment and initially looked healthy in production as well. However, about 7 hours after the upgrade, one of the six load balancers started failing to reach backend services — and that's what caused the errors users experienced.

    The root cause is a bug in the new HAProxy version that only shows up under a very specific, still-unidentified set of conditions. Our test deployment and the other five production balancers kept working fine, which is why the issue slipped past our pre-rollout checks.

    Why it took us a while to notice
    When a request failed, our system automatically tried it again, and the retry usually succeeded. So from the outside things looked a bit slow or occasionally flaky rather than clearly broken. That automatic retry behavior is normally a good thing — it hides small, transient glitches from users — but in this case it also hid the growing problem from our monitoring long enough to delay detection.

    What we did
    - Rolled back the affected balancer to the previous HAProxy version, which immediately restored normal traffic.
    - Paused all further HAProxy upgrades across the fleet until we understand the trigger.

    What's next
    We're working to reproduce the bug in a controlled test environment so we can pinpoint the trigger, confirm a fix (either a patched version or a config change), and safely resume the rollout.

  • Resolved
    Resolved

    This incident has been resolved. We'll get back with details at a later date.

  • Investigating
    Investigating

    We are currently investigating elevated error rates affecting approximately 1.5% of traffic to some internal web services. API endpoints are operating normally and are not affected by this incident.

Mar 2026 to May 2026

Next