- Nationwide disruption of voice and text services for more than six hours.
- FCC Chairman calls for an investigation.
- TMUS says “trigger event” caused by failed leased fibre circuit.
Ajit Pai, Chairman of the Federal Communications Commission, thought an extensive network disruption on 15 June 2020, which made it impossible for customers of T‑Mobile US to make calls or send text messages, was “unacceptable”.
Reacting swiftly to the disruption, and no doubt concerned it was not possible at times to make emergency 911 calls during the six hours the network was mal-functioning, Pai took to Twitter to vent his annoyance. “We’re demanding answers, and so are American consumers”, he tweeted. Pai added that the FCC was launching an investigation.
IP hits the buffers
Top T‑Mobile US executives issued statements before Pai’s intervention. Mike Sievert, Chief Executive at TMUS, blamed the “intermittent” problem on an “IP traffic related issue”. He said it was caused by “significant capacity issues in the network core”. Sievert maintained that data services were “working throughout the day”.
Neville Ray, President of Technology, subsequently gave a bit more detail. He pinned the blame on an “IP traffic storm” that made the IP Multimedia Subsystem keel over and disrupt voice-over-LTE. Non-VoLTE calling, apparently, worked fine.
Ray said TMUS engineers had worked with IMS and IP vendors to “add permanent additional safeguards”. He added that an investigation was being carried out to determine the cause of the initial overload failure. Ray did not name the third-party provider of the leased circuit that formed part of the “trigger event”, other than to say it hailed from the south-eastern US.
“ This is something that happens on every mobile network, so we’ve worked with our vendors to build redundancy and resiliency to make sure that these types of circuit failures don’t affect customers. This redundancy failed us and resulted in an overload situation that was then compounded by other factors. This overload resulted in an IP traffic storm that spread from the south-east to create significant capacity issues across the IMS core network that supports VoLTE calls. ” — Ray.
We’ve been here before
T‑Mobile US has suffered high-profile network issues periodically over the last few years (Deutsche Telekomwatch, #61, #70, #86, and #87). In August 2019, it was hit by a wide‑scale service disruption, although remained coy on the causes. Media coverage suggested the outage prevented users from making calls and sending text messages, but, as with the latest glitch, mobile data services were apparently not hit. Like Sievert on his first reaction to this week’s hoo-hah, the operator presented the issues as merely “intermittent”.