2022-07-14 - Device Provisioning Server Unreachable (Resolved)
Table of Contents
Event Description: The ATL Provisioning server was unreachable, causing new device provisioning to fail.
Event Start Time: 2022-07-14 5:44 PM EST
Event End Time: 2022-07-14 5:55 PM EST
Event Report Issue Date: 2022-07-14
Affected Services
- New device provisioning
- Logins to the device provisioning server
Event Timeline (All time 24-hour format, EST)
July 14th, 2022
- 1744 - Our monitoring tools reported the device provisioning server was unreachable
- 1746 - Engineering began investigating the issue
- 1753 - Confirmed that the hypervisor hosting the device provisioning server rebooted on its own
- 1755 - The hypervisor and device provisioning server came back online
- 1808 - Notified partners and clients of outage/resolution via Discord and status page
- 1910 - Root cause identified as a forced system update that bypassed the servers update policy
- 1912 - Updated the servers update policy to exclude these forced system updates to prevent it from happening again in the future.
Root Cause Analysis
System updates on the server were set to enabled. During this time, a system update that was deemed as critical forced the server to restart. This bypassed the servers restart policy.
Future Preventative Action
We have updated the servers system update policy to be manual. This will prevent the server from auto updating on it's own and will allow us to reboot the server when necessary and during the appropriate maintenance window.