2024-04-30 GRR Loss of Network Connectivity (Resolved)
Table of Contents
Event Description: The SNAPHosted server oit-core-041-grr.cls.iaas.run experienced a period of intermittent connectivity availability.
Event Start Time: 2024-04-30 15:51 EDT
Event End Time: 2024-04-30 17:28 EDT
RFO Issue Date: 2024-05-02
Affected Services:
- Inbound and outbound calling
- Phone and device registration
Event Summary
The GRR server was intermittently unavailable during a brief upstream network incident. During this time active calls were dropped and device registrations failed over to other cores. After confirming network stability, an announcement was published confirming the restoration of service.
Event Timeline
April 30th, 2024
- 15:51 EDT - CSE Team alerted to HTTPS Health Check / unresponsive and immediately begin triaging server status
- 15:54 EDT - Confirmed significant call and registrations decrease from GRR core
- 15:54 EDT - Second recorded ping availability notification and decrease in device registration
- 15:56 EDT - Confirmed with our support vendor that the incident is not with our equipment and is located upstream
- 16:05 EDT - Third recorded ping availability notification and decrease in device registration
- 16:08 EDT - Request to data center support for confirmation of network outages sent
- 16:13 EDT - Fourth recorded ping availability notification and decrease in device registration
- 16:17 EDT - Announcement posted to all channels
- Hello OIT family, we are currently tracking a connectivity issue with the GRR datacenter that started at 15:50 EDT. Calls may have been briefly impacted if utilizing that core. Device registrations shifted to the other cores as designed and we are watching registrations return to their normal homes. We will update you after we have confirmed availability and stable operations.
- 16:26 EDT - System connectivity stabilized and no more alerts triggered
- 17:00 EDT - Confirmation from our data center support an unplanned network event was located upstream of the GRR data center
- 17:28 EDT - Final announcement posted to all channels
- All metrics regarding the connectivity issue with the GRR data center are showing normal values. As such, we are marking this closed. We will follow up with our hosted partner to identify any specific issues and mitigate future problems. If you continue to experience issues please contact our support and we will triage on a case-by-case basis. MIR will follow up within the next 48 hours.
Root Cause
We are working with our service provider to isolate the root cause and will update this article once we have confirmation.
Future Preventative Action
Once the root cause has been identified we will explore options to mitigate future related incidents.