Chet dating system
Since going open source a year ago (we had the gall to release it on April Fool's Day, 2007), the Thrift project has steadily grown and improved (with multiple iterations on the Erlang binding).
Cache misses and database failure can be detected by the non-database layers and either reported to the user or worked around using replication.
While this architecture works pretty well in general, it isn't as successful in a chat application due to the high volume of long-lived requests, the non-relational nature of the data involved, and the statefulness of each request.
With our web, desktop (Windows, Mac OS, Linux (Deb, RPM)), mobile (i OS, Android) and Live Chat clients and SDKs you chat from anywhere.
When you are offline, our robust notification system will make sure you never miss anything important.
The naive implementation of sending a notification to all friends whenever a user comes online or goes offline has a worst case cost of O(average friendlist size * peak users * churn rate) messages/second, where churn rate is the frequency with which users come online and go offline, in events/second.
Because the data is persisted, a failed read request can be re-attempted.
Even without accounting for the sizeable overhead of spawning an OS process that, on average, twiddles its thumbs for a minute before reporting that no one has sent the user a message, the waiting time could be spent servicing 60-some requests for regular Facebook pages.
The result of running out of Apache processes over the entire Facebook web tier is not pretty, nor is the dynamic configuration of the Apache process limits enjoyable.
Fault tolerance is a desirable characteristic of any big system: if an error happens, the system should try its best to recover without human intervention before giving up and informing the user.
The results of inevitable programming bugs, hardware failures, et al., should be hidden from the user as much as possible and isolated from the rest of the system.