2024/07/19 Native Fax Major Incident (Resolved)
Table of Contents
Event Description: The Native Fax server that processes inbound and outbound faxes for Portal Fax, Physical faxes and Fax to Email failed to process inbound and outbound Faxes.
Event Start Time: 2024-07-19 12:30 EST
Event End Time: 2024-07-19 15:51 EST
RFO Issue Date: 2024-07-26
Affected Services:
- All Inbound and Outbound Faxing on Native Fax
Event Summary
Initially, during the major incident, outbound faxes were reporting “30236 Gateway not responding to dial request” when attempting to send. While investigating, a new error “Port Server Busy” was reported by the software, and incoming faxes were also identified as not being received by the Native Fax software. Working with our software vendor, we were able to isolate the issue to the "Port Server" service. Multiple restarts of the service were required to restore Native Fax functionality
Event Timeline
July 19, 2024
- 10:30a EST Our engineers became aware of outbound faxing failures via Native Fax.
- 10:37a EST Announcement was sent out advising of the outage.
- 11:00a EST Our engineers attempted to restart the Port Server service on the Native Fax server to restore service.
- 11:47a EST After isolating the failures to faxing via ATA only, an announcement was posted advising users to utilize faxing via the manager portal or email.
- 12:13p EST After multiple restarts of fax services, we engaged the fax vendor for further assistance.
- 12:43p EST Inbound faxing via ATA was restored successfully, and our engineers continued troubleshooting with the vendor to resolve outbound faxes via ATA.
- 1:00p EST After identifying that faxes were not processing at all, our engineers isolated the root cause to the Port Server service on the fax server and began recreating the service to attempt to resolve the problem.
- 3:00p EST As part of troubleshooting, the fax vendor installed an updated Port Server service.
- 3:24p EST Fax vendor advised adjusting inbound and outbound fax port counts to allow influx of failed outbound faxes to pass through. This required an additional restart of the Port Server service.
- 3:30p EST Inbound and outbound faxes on all services began processing successfully.
- 3:51p EST After further monitoring and confirmation that the failures were no longer occurring, final resolution was declared and announced.
Root Cause
Both error messages “30236 Gateway not responding to dial request” and “Port Server Busy” were caused by the “NSX Port Server” service being unable to process requests due to memory conflicts with a Netsapiens integration module. During the troubleshooting process we identified that some faxes were not processing at all and immediately dropping as well.
Future Preventative Action
- We have immediately implemented additional alerts for quicker identification if this happens in the future.
- We have identified a possible resolution path in the event that this happens in the future for quicker resolution. This requires further testing to confirm.