Caching external content for site performance

I recently had to re-install some twitter/tumblr syndication modules on a few small sites. Reason: sudden performance issues. The sites were medium-sized, and depended heavily on externally syndicated content. You’ve seen these busy sidebars before: a few recent twitter posts, a few more tumblr pics, another FeedBurner, maybe a dozen of Disqus/Echo comments, Amazon widget. This is all good, and wonderful, to spread your content/community around on multiple platforms, and syndicate them on your site. But keep in mind that all these services – while practical and very responsive – are relatively new technologies. Young start-up companies.

No matter how popular they might be, sometimes it’s difficult for them to scale up the infrastructure when a spike of traffic hits. In fact, the more popular they get, I assume the more expensive it is for them to keep up with bandwidth demand. It’s the same phenomenon as being ‘digged’ or ‘stumbled’ and not being able to keep up with incoming traffic. The scenario hasn’t changed, only the players have. The simple fact is that so much news happens online these days (as opposed to radio/tv/cable), that anything newsworthy gets instantly multiplied, re-sent, CC’d, re-tweeted, copy-pasted and can bring down a server or two.

And if your site happens to syndicate external content (from a busy portal), you should be prepared for internet-wide performance hits. While the proper solution is to have a site-wide caching module installed, and maintain a load-balance application, you might not be able to do this on your site. Cache the block (or section) that slows you down.

Often you just want the site to instantly syndicate someone else’s feed, or your own external feed – be it text, images, videos, etc. RSS – realĀ  simple syndication. No-brainer to set it up, right? Even more fun to have five such block – on every page, right? But keep in mind that even that basic syndication block should come with a standard caching mechanism. Some ability to save the incoming content into a static file, local, on your server, and update it every few minutes. Or every few hours – whatever your preferences are. Do the math – if your pages get one hit a second on average, a piece of cached content (that’s refreshed every five minutes) will save you extra 299 connection requests to that content provider (60×5 – 1) – every five minutes. It’s a simple technique that can make a difference between ALL your sidebars loading slowly (when some big news hits the web), and a small delay in displaying some external news from a content aggregate. Even with global caching mechanisms (ISPs do that, search engines do that, news portals do that) you don’t want to rely on their infrastructure, and just be a part of the domino chain. Take a little time to set up your local ‘backup plan’.

We all want to have the latest news, and most up-t0-date information, but a little trade-off for your local performance is a good thing to consider. After all, we’ve seen Twitter Fail Whale, and Tumblr’s error page. And we’ve even seen the almighty Amazon cloud servers slow to a crawl. They come back eventually, but they shouldn’t drag your site along while they’re scaling up. Use caching plugins. Update your old syndication methods to newer versions. Thank me later.

PS: and who knows – if enough big news portals used intuitive, balanced caching mechanisms for their incoming streams, maybe the big content providers like Twitter/Tumblr/Facebook wouldn’t slow down quite as much. It’s a reasonable trade-off, I hope.

Leave a Reply

Only people in my network can comment.