Major routing issues on the Internet
Incident Report for Pingdom

The Outage

A backbone carrier misconfigured something, causing routing issues for a lot of the world. More details here and at CloudFlare's status page here.

How did it affect Pingdom services?

Pingdom probe servers from around the world could not send data to us to alert in a timely manner. This coupled with the huge increase in the number of outages reported and subsequent alerts sent out caused delays.

Simplified and Basic alerting experienced some short delays whereas the less commonly used BeepManager alerts were significantly delayed, in some cases up to 20 minutes.

The huge number of outages reported simultaneously, along with My Pingdom’s reliance on Cloudflare resulted in access troubles on My Pingdom, and a general slowdown of the service for the duration of the incident.

What will we do to remedy this?

In the face of the recent huge outages (DynDNS outage in October 2016), we are working on rebuilding the core of our alerting and reporting to mitigate the impact huge internet outages has on our service. We can of course not promise 100% uptime on our service since we are on the internet after all, but we’re working on making things better.

Posted 6 months ago. May 03, 2017 - 15:17 UTC

Resolved
The alerting queue is now cleared up and all services are back to normal. Thank you for your patience, post-mortem will follow tomorrow!
Posted 6 months ago. May 02, 2017 - 16:25 UTC
Monitoring
The issue has been mostly resolved (the Internet is back, yay!)
Access to my Pingdom and pingdom.com restored

We are still seeing some delayed alerts due to huge queues formed, this affects beepmanager alerting in Pingdom, but we expect this to catch up quickly.

We'll close this incident as soon as the queues are cleared up and we'll continue to monitor the situation.
Posted 6 months ago. May 02, 2017 - 16:05 UTC
Identified
Looks like it's major routing issues on the internet.

Cloudflare is also reporting on this issue here https://www.cloudflarestatus.com/incidents/51q3xhq8w7t8.

Right now we are observing issues accessing my Pingdom and pingdom.com, our alerting and monitoring is working, but there can be major delays for some.
Posted 6 months ago. May 02, 2017 - 15:16 UTC
Investigating
Our operations team have discovered that something is wrong on the internet. We suspect a major ISP or similiar, this affects all our services, to unknown degrees so far, we will continue to investigate and update here.
Posted 6 months ago. May 02, 2017 - 14:57 UTC