TomsTechBlog.com

It's hard to say these days

The (Over) Dramatic World of Twitter (and a word on scale)

clock May 30, 2008 13:16 by author Tom

So this is kind of funny...

On February 8th, 2008, Israel of AssetBar.com wrote a really informative post entitled "Twitter Proxy: Any Interest?"  In it he pointed out this fundamental problem with Twitter...

Nothing is as easy as it looks. When Robert Scoble writes a simple “I’m hanging out with…” message, Twitter has about two choices of how they can dispatch that message:

  1. PUSH the message to the queue’s of each of his 6,864 followers, or
  2. Wait for the 6,864 followers to log in, then PULL the message.

 

The trouble with #2 is that people like Robert also follow 6,800 people. And it’s unacceptable for him to login and then have to wait for the system to open records on 6,800 people (across multiple db shards), then sort the records by date and finally render the data. Users would be hating on the HUGE latency.

This post was later quoted by Dare Obasanjo which led to an angry response from Robert Scoble where he said...

First of all, Twitter doesn’t store my Tweets 25,000 times. It stores them once and then it remixes them. This is like saying that Exchange stores each email once for each user. That’s totally not true and shows a lack of understanding how these things work internally.

Second of all, why can FriendFeed keep up with the ever increasing load? I have 10,945 friends on FriendFeed (all added in the past three months, which is MUCH faster growth than Twitter had) and it’s staying up just fine.

Now, with no offense to Scoble, this is a fundamentally ignorant assessment of what is actually going on.  A fact that almost every technical minded person pointed out to him.  The best of those posts, imho, was from Nick Halstead where he says...

In a recent post Robert Scoble tries to explaining how twitter works by saying that twitter is using some form of ‘pivot table‘ - (my terminology for what he explains) and says that a model that others have put forward (i.e. a de-normalized system of inserting messages into everyones queues) was akin to microsoft exchange, now these two examples are so horribly not connected - and I won’t rant about how BAD exchange is efficiency wise, but please Robert do not get into any technical arguments please.

(Its a great post that the above quote doesn't do justice so everyone should check it out)

This led to the same question being presented to the authors of the official Twitter technology blog.  Their reply was to basically confirm what everyone had been saying...

charles asks if there's anything users can do to lighten our load. The events that hit our system the hardest are generally when "popular" users - that is, users with large numbers of followers and people they're following - perform a number of actions in rapid succession. This usually results in a number of big queries that pile up in our database(s). Not running scripts to follow thousands of users at a time would be a help, but that's behavior we have to limit on our side.

A response that Scoble quickly took offense to saying in his link blog...

“This is total bullshit. Why do I have 11,556 subscribers on FriendFeed, I'm FAR FAR FAR FAR more active on FriendFeed, and yet FriendFeed never has gone down on me? Also, Twitter went down at its first SXSW before I had a ton of followers there. Twitter has major problems, they still don't have a good engineering answer, and so they are blaming their most popular users. Great. We get the message. We'll go someplace where there's a good engineering team. You know, the guys who invented Gmail and Google Maps? They are the ones behind FriendFeed. See ya Twitter!”

So there you have it, a little background on what has become a huge mess for Twitter. 

Honestly, I don't have much of an opinion on the human drama part of this (other than to find it pretty amusing).  Is Scoble being unreasonable?  Absolutely.  But that's somewhat understandable when you consider the fact that everyone is pointing their fingers at him when he essentially did nothing wrong.  The truth is, Twitter loved Scoble signing up 25,000+ followers when it meant drawing users in to the service. 

So for him to be slapped on the hand now for it is a bit obnoxious. 

On FriendFeed, there are two issues there.  One,  the model by which people monitor other people's FriendFeeds is not like the model that Twitter uses which lowers their back end problems and two, FriendFeed has far fewer users.  If they were trying to deal with as many users as Twitter I suspect they'd be having problems as well. 

One final point I'd like to make is regarding the root cause of Twitter's woes.  A lot of people have tried to lay this on the feet of Ruby on Rails which is a bit unfair.  With that being said, Twitter's problems do belong at the feet of the "Rails Philosophy" which was set down by its creators 37Signals.  Here's a quote from their e-book "Getting Real"...

You don't have a scaling problem yet

"Will my app scale when millions of people start using it?"

Ya know what? Wait until that actually happens. If you've got a huge number of people overloading your system then huzzah! That's one swell problem to have. The truth is the overwhelming majority of web apps are never going to reach that stage. And even if you do start to get overloaded it's usually not an allor-nothing issue. You'll have time to adjust and respond to the problem. Plus, you'll have more real-world data and benchmarks after you launch which you can use to figure out the areas that need to be addressed.

I've never thought much of the 37Signals gang and it is quotes like the above one that are the reason why.  Putting off your most difficult technical tasks until later is an utterly stupid thing to do and is, as Twitter is now finding out, disastrous if you can't quickly address the problems.  37Signals wouldn't necessarily know that though because their most popular application has fewer users than a program I wrote at 19.  Yet they still speak as and are treated like they are leading experts in application design. 

They're not. 

Not only that, a lot of their advice is outright bad as this Twitter solution proves.  This is all symptomatic of a larger problem on the web which is bloggers' not questioning who they treat as an authority.  Anyone who realistically looks at 37Signals will see that they are still a fairly small development firm.  That doesn't mean they haven't done some impressive things or that their opinion has no merit at all.  But it needs to be put in context and people don't seem to be doing that. 

Had the Twitter architects done that they wouldn't be in the situation they're in now. 

Addendum:  This occurred to me as I was hitting the publish button of the above post.  When you go to the bank for a new business loan what is the first thing they ask you for?  A 2-Year Plan.  That's because no sane institution lends money to anyone who has no idea where they are going or how they plan to get there.  Given that I ask you, doesn't the same apply to technology issues and in particular scaling? 



Google App Engine Follow up: A Word on Scaling

clock April 14, 2008 15:12 by author Tom

I wanted to address one issue specifically since I still think people have a weak grasp of what it means.  That issue is scaling.  There have been two scale arguments that have been presented to me in the responses to my last post so I addressed both below. 

Scaling for an Individual Application

One of the arguments presented has been that Google's solution allows applications to scale beyond the point that a shared host could match.  This is because, according to those who make the argument, Google has massive resources that they can employ if your application grows into a high traffic site.

What people are missing here is that, once you get to a certain size, it becomes cost effective to host your own site.  No matter how efficient a company like Amazon or Google is they have to charge an overhead to make money.  Once you get to the point where you can afford to hire a full time server person and maintain a server farm for yourself there's no way those companies can compete because of the overhead they have to charge. 

So people thinking that one of the benefits of the Google App Engine is that it can scale into infinity miss the point that no one will ever need that even if Google was providing it. 

Which Google isn't, and that brings me to my second point...

Scaling for Multiple Applications on Google's Server

There's been much confusion over one thing in my previous post.  That thing was...

Google App Engine's ability to scale depends on how much server resources Google is willing to dedicate to the task of running these applications.  Google is not going to risk slowing down their primary services for a Google App Engine application.  So their ability to scale could very well be less than other companies, we just don't know.

So I thought I'd elaborate on that.   One of the arguments people have made is that Google's massive data centers gives them the capability to completely eliminate performance drags.  So Google App Engine applications will perform faster because they'll never have to wait for CPU resources. 

What this line of reasoning misses is that we don't know how many of Google's computers they are willing to dedicate to the task of hosting App Engine Applications.  So while a normal shared host might put 100 website's on every computer that still might outperform Google if Google's Server-To-Application ratio is higher.  So even if Google has 300,000 computers dedicated to Google App Engine and your shared host only has 100 computers to their name they could very well be even in performance if Google has 30,000,000 App Engine applications to run. 

In fact, I'd argue that Google's popularity and the fact that the service is free makes App Engine more likely to exceed your average web hosts Server-To-Application ratio. 

In the End...

When all is said and done my point still stands.  Scalability really doesn't play a part when deciding between a web host and a service like Google or Amazon.  It all boils down to price.  Since Google has chosen not to reveal their pricing yet there's no way to compare.  That is exactly why it is too early to be singing the praises of Google's solution (which was the point of my last post).



About Me

Not really relevant right now. This blog is on hiatus. I really haven't decided if it is an indefinite hiatus yet

For the record if you've tried to e-mail me over the last 4 to 6 months I didn't mean to ignore you. The e-mail forwarding isn't working and I didn't realize that until months worth of e-mails had been deleted on forward. The tom@tomstechblog.com address still won't forward to the postmaster account and I don't know why because it's provided by the webhost. But if you're one of my old blog pen pals I would always welcome an e-mail from you at the postmaster@tomstechblog.com address

Contact

- E-Mail Tom

Search

Subscribe

- Subscribe to this Blog

Calendar

<<  June 2013  >>
SuMoTuWeThFrSa
2627282930311
2345678
9101112131415
16171819202122
23242526272829
30123456

Archive

Tags

Categories


Blogroll

    Disclaimer

    The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

    © Copyright 2013

    Sign in