This page is a summary of past incidents. Level 1 incidents (disruptions) are published here within 7 days, and level 2 incidents (emergencies) within 3 days.
Current information on ongoing incidents can be found at https://status.eduid.ch
Date, Time, Duration | 28.2.2025, 09:12, 10 minutes |
---|---|
Severity | Level 1 - Service Disruption |
Incident Summary |
All users encountered an timeout message during the login process |
Affected users |
Potentially every login during the time frame |
Root cause analysis |
Deployment of a faulty configuration on the load balancers lead to the unavailability of most Identity Provider nodes. The active nodes were not in a proper working state. |
Resolution and recovery |
The faulty configuration of the load balancers was rolled back as soon as our monitoring system send out an alert and the mistake was noticed. |
Preventive measures, future actions and other learnings |
|
Date, Time, Duration | 17.2.2025, 10:10, 30 minutes |
---|---|
Severity | level 2 - degraded performance |
Incident Summary |
Most users encountered a slow login process or saw a timeout message |
Affected users |
Potentially every login in the time frame |
Root cause analysis |
The combination of a temporary high load and a rolling upgrade of IdP nodes led to a degradation of the performance. Only half of the IdP nodes were active after 10:00. The remaining nodes were not capable to handle the incoming login requests in an acceptable amount of time. |
Resolution and recovery |
Disable the overloaded nodes and switch over to the spare nodes. Since the request peak was already over at the time, the remaining nodes could handle the load. The previously overloaded nodes were also enabled again after they have recovered. |
Preventive measures, future actions and other learnings |
|
Date, Time, Duration | 12.2.2025, 13:45, 20 minutes |
---|---|
Severity | level 1 - service disruption |
Incident Summary |
Some users were not able to authenticate with MFA. |
Affected users |
About 40% of the users which needed to login with a second factor or passkey during the incident were affected, and they had to retry later |
Root cause analysis |
The IdP nodes authenticate to our internal APIs using secure mTLS authentication. The renewal of involved certificates led to authentication failures on two of five IdP nodes, due to a mismatch of machine identities in the API configurations. The configuration mismatch was caused by a coincidence involving the introduction of new IdP nodes and usage of a shared client certificate on all nodes. |
Resolution and recovery |
Rollback to the previous certificate configuration on the IdP instances, based on the documented rollback strategy |
Preventive measures, future actions and other learnings |
|
Date, Time, Duration | 21.1.2025, 9:00, 3 hours |
---|---|
Severity | level 1 - service disruption |
Incident Summary |
edu-ID users could not create a new account. |
Affected users |
Around 200 users were affected, and they had to retry later |
Root cause analysis |
The underlying cause was new version of PHP which prevented the account management to write back changes to the user database. |
Resolution and recovery |
Rolling back to previous version of PHP. |
Preventive measures, future actions and other learnings |
|
Date, Time, Duration | 16.12.2024, 10:00, 30 minutes |
---|---|
Severity | level 1 - service disruption |
Incident Summary |
Some edu-ID users (but not all) didn't receive SMS messages for around 30 minutes (10:00-10:30). SMS were not delivered in bursts. |
Affected users |
This affected in particular the 2-step login for users without TOTP. |
Root cause analysis |
The SMS provider reported that "experienced unexpected network issue" caused some SMS not being sent. |
Resolution and recovery |
The problems occurred in bursts and not for all users. Therefore, by the time the edu-ID team was made aware of the issue by a few users, the issue was already resolved. Also, due to a monitoring problem the issue was not reported earlier. |
Preventive measures, future actions and other learnings |
|
Date, Time, Duration | 19.9.2024, 10:03, 30 minutes |
---|---|
Severity | level 2 - service disruption |
Incident Summary |
edu-ID users didn't receive SMS messages for around 30 minutes (10:00-10:30). This affected in particular the 2-step login for users without TOTP. |
Affected users |
In total, there were 1793 unsent SMSes (present without delivery report in the logs) between 10:00 and 10:36 on 19.09.2024. In the same time range of the previous day, there were only 23 undelivered SMSes. These requests are associated with 621 different mobile numbers. Thus, we can conclude that about 620 users didn't get their requested SMS for mobile verification. |
Root cause analysis |
The cause of the problem was a congestion in the delivery queue of our primary SMS provider. |
Resolution and recovery |
We could switch over to our alternative SMS provicer after half an hour, such that SMS messages could be sent again. The primary SMS provider solved the problem later. We switched back to the primary provider before noon. |
Preventive measures, future actions and other learnings |
|
Date, Time, Duration | 18.9.2024, 2:00, 7.5 hours |
---|---|
Severity | level 2 - service disruption |
Incident Summary |
No SMS are sent to edu-ID users via an internal API. This affected in particular the 2-step login for users without TOTP. Also, no e-mails were sent via the same API e.g. to reset passwords. TOTP authentication was not affected. |
Affected users |
In total, requests from 2687 different users failed. Thus, we can conclude that almost 2700 users saw at least one error during MFA login with SMS or mobile/email verification. |
Root cause analysis |
The problem was that the communication between two internal APIs failed due to an expired X.509 certificate whose automatic renewal failed. |
Resolution and recovery |
Manually restart the acme-cert-renewal services on all nodes of the internal api for the client certificates. |
Preventive measures, future actions and other learnings |
|
Date, Time, Duration | 16.9.2024, 7:50, 2 hours |
---|---|
Severity | level 2 - service disruption |
Incident Summary |
On Monday morning at around 8.00, start of fall semester for all Swiss universities, it was noticed the edu-ID login failed or was delayed for some users, while for others it went through smoothly. |
Affected users |
Because not all users were affected and many users eventually managed to login after a few attempts, it is difficult to estimate the number of users. But about 200 additional support tickets, several phone calls and direct emails were retrieved by the edu-ID team and the Switch front desk. It is estimated that several thousand users were affected. |
Root cause analysis |
There were several factors that played a role in this issue: The many user logins (about 5x higher than in past weeks) due to semester start and the increased usage of MFA were two of them. However, the actually relevant cause was a missing index on a database table that consumed a lot of CPU in combination with the above. Even though load tests were performed on the internal MFA API before it was enabled in Spring 2024 and even though the MFA API has been used for months without problems, this problem remained hidden until a massive number of logins by many different users triggered it. |
Resolution and recovery |
The creation of a database index immediately solved the issue. |
Preventive measures, future actions and other learnings |
|
Term |
Level |
Definition |
---|---|---|
Minor |
0 |
An unplanned interruption to a service or a reduction in the quality of a service, with low impact on users, services or organizations. Level 0 incidents are not publicly reported. |
Disruption |
1 |
Partial or short term disruption to services or compliance. |
Emergency |
2 |
Significant and widespread disruption of service or compliance; Reputational damage, Damage to individuals, including SWITCH staff. |
Crisis |
3 |
A situation with serious strategic or reputational damage or where there is a credible risk to life or health of individuals. Some incidents trigger crises |