Questioning the Dependence on Third-Party Service Providers
According to this Amazon.com S3 Team write up on the loss of their S3 service’s availability, “very few requests were completing successfully” during an availability event. The service’s problems seem to have started at 08:40, and the service’s error rates did not fall back to normal, acceptable levels in the United States until 16:02pm. The level of precision offered by the write up, which provides times with minute precision, suggests that the duration of the event was 7 hours and 22 minutes. Allowing a year to be 8760 hours (365 days* 24 hours), the maximum uptime that they can achieve assuming that no other events will occur within the year is 98.318%.
Certainly this event is significant for services that depend on S3 and are contractually obligated to have even just “three nines” or 99.9% availability. I have seen software implementations as well as whole companies that rely on the performance of an uncontrollable third-party. This S3 event is merely another example of what can occur even at a gigantic corporate entity such as Amazon, and it should dissuade people from building their critical systems around uncontrollable third-party services like S3.