Summary:
On 2025-12-17, our API Gateway experienced a temporary degradation caused by an unsuccessful update of an ingress component.
Impact:
Elevated 5xx error rate on the API Gateway.
Average 5xx rate over the incident interval: ~1.76% of total RPS.
Peak 5xx rate: ~9–10% of total RPS.
Partial request failures for API clients during the incident window.
Timeline (UTC):
11:25 — 5xx rate begins to increase.
~11:35 — Peak degradation observed.
~11:55 — Errors drop back to baseline and service stabilizes.
Incident Duration:
From 11:25 to 11:55 (UTC) — ~30 minutes.
Root Cause:
A failed deployment of an ingress component introduced regressions that affected API Gateway request handling, resulting in increased 5xx responses.
Resolution:
We rolled back/reverted the ingress component update and confirmed the API Gateway error rate returned to normal levels.