2021-11-11 Outbound Call Failures from Atlanta (Resolved)
Table of Contents
Event Description: Devices registered to Atlanta nodes were unable to place outbound calls.
Event Start Time: 2021-11-11 19:32 PM EST
Event End Time: 2021-11-11 22:30 PM EST
RFO Issue Date: 2021-11-13
Affected Services
Outbound calling for devices registered to Atlanta nodes.
Event Timeline (All times 24-hour format, EST)
November 11th, 2021
- 19:32 we began receiving reports of 503 SIP responses on outbound calls. Users began reporting the inability to place outbound calls shortly after. Existing active calls and inbound calls were unaffected.
- 19:34 engineers began reviewing case and log data
- 19:59 senior and platform engineers engaged to review data
- 20:17 logs showed a high amount of memory being consumed by call activity.
- 20:33 Atlanta set to maintenance mode. All calls and devices are redirected to other nodes in order to perform emergency adjustments.
- 20:48 Adjustments applied and Atlanta nodes restarted
- 20:54 Atlanta put back in service. Verified devices and calls set to Atlanta as their home returned to service normally.
- 21:27 Logs reveal a large amount requests found coming from a French IP. Research determined the IP to be related to a malicious network. IP added to filtering services and connections closed.
- 22:30 All services remain nominal. Incident closed.
Root Cause Analysis
A configuration in our platform controlled how quickly memory was released after a call concluded. While the system had the necessary resources to handle the additional calls from the malicious IP, memory was not being released fast enough to handle the new calls.
Future Preventative Action
V42 already included changes to these settings that increase the speed with which memory is freed. The new settings would've prevented this incident. Engineering applied the same settings to our current platform and confirmed they are functioning as intended.