2024-03-02 LAS Core Dump (Resolved)
Table of Contents
Event Description: LAS Core Dump
Event Start Time: 2024-03-02 13:06EST
Event End Time: 2024-03-02 13:23 EST
RFO Issue Date: 2024-03-04
Affected Services:
- Inbound and Outbound call services from the LAS datacenter.
Event Summary
The core NMS service crashed resulting in active calls to drop and registrations shift to other servers as part of their failover plan.
Event Timeline
March 2nd, 2024
- 13:06 EST - Core dump logged on oit-core-021-las.cls.iaas.run resulting in service degradation
- 13:11 EST - OIT Engineers alerted of SIP health checks failing for LAS
- 13:17 EST - System logs reported Core as back online
- 13:19 EST - OIT Engineers isolate the health check failure as a core dump
- 13:32 EST - System logs show SBUS, registration counts, and other services return to normal
- 13:33 EST - Announcement posted:
- At 1313 EST we received a notification from our Las Vegas core of a service interruption. Failover is working as intended and only active calls would have been impacted. Services are returning to normal and we are currently investigating. Next update by 1400est
- 14:06 EST - Final announcement posted:
- We are continuing to monitor performance and services have returned to normal. We will continue to monitor and follow up with our MIR in the next 48 hours.
Root Cause
Core service sipbx segmentation faulted and caused a core dump. Segmentation fault was linked to known bug NMS-2659
Future Preventative Action
The bug that caused this failure is resolved in platform update v44 and higher. We are already testing that platform internally and will be upgrading after all tests are completed and maintenance upgrade window scheduled.