What Happens When Web-Scale Computing Becomes a Commodity?

With the announcement of Amazon’s much-anticipated SimpleDB service this week, we now officially live in a world where the kind of enormous systems run by Google, Yahoo, Ebay, et al — systems that power huge portions of the web (where 500+ million users is totally mundane) — are available on demand in small doses and at reasonable prices to anyone who needs them. Amazon Web Services now provides all the necessary infrastructure to run applications that host millions of files for download, persist hundreds of millions of database records, and run thousands of processes, all without building or maintaining any physical infrastructure.

On this infrastructure, the only real difference between running a small application (a custom CMS for a medium-sized non-profit, for example) and a large one (say, Digg) is the size of your monthly bill. And as other companies besides Amazon enter this market as a relatively simple way to monetize their huge existing infrastructure costs the size of that bill will fall as well.

So, what are we talking about here? Within a few years, a scale of computation that is currently only available to a handful of multi-billion dollar companies will be available to any pair of dorm room-bound hacker kids with $30/mo. and a pair of MacBooks.

Just as the rise of commodity server hardware and open source software revolutionized the web over the last ten years, it would be reasonable to expect changes of similarly breathtaking scope as we undergo the commoditization of web-scale computing over the next ten.

For a few clues as to what happens when the current, if obscure, state of the art becomes an industry-standard lowest common denominator, it helps to look at some history. This has happened at least twice in the last thirty years: once when industry standardization around x86 hardware lead to the collapse in prices for DOS- and then Windows-compatible PCs that made them ubiquitous around the world; and again when these same PCs reached a level of power and the open source software written to run on them reached a level of affordability and reliability that, together, they displaced the expensive and proprietary server systems and radically lowered the barrier to entry for web-development leading, as Tim O’Reilly has clearly outlined, to Web 2.0.

Each of these transitions had the same two high level effects: they made it cheaper to produce professional caliber work and they increased the value of openness.

The rise of the ubiquitous, cheap, and powerful PC has created near-universal access to the best digital tools available. Professional accountants, graphic designers, record producers, photographers, and countless others do their work on exactly the same relatively cheap hardware available to the average consumer playing games and writing email.

Similarly, the precipitous pricing drop in web application development and deployment environments caused by the birth of the LAMP stack and the commodity servers on which it runs made it possible for startups like del.icio.us and Flickr without first raising millions of dollars in venture funding to buy Sun Workstations. (And this doesn’t even take into account the second order revolution in the content industry caused by the cheap-to-free hosting publishing tools created by these very startups and run on commodity web hosting.)

The openness story is, if anything, even more shocking. Like fruit flies spontaneously generating out of garbage, Linux grew out of the universally available commodity PCs with their high levels of hardware compatibility. The same process that filled the PC landscape with identical gray boxes running Windows made it possible for a few OS geeks from obscure countries to build, in their spare time, an operating system that could feasibly run, for free, on all of them.

Similarly, the commoditization provided by the triumph of the LAMP stack made it possible for a handful of people to build non-profit web applications like Wikipedia and Craig’s List whose only mission is to make useful information universally available to anyone who wants it.

Now. What form will these two kinds of changes take with the use of web-scale computing?

Let’s start out with the ability to produce professional caliber work more cheaply. Currently now, it was only feasibly to build web-scale applications if the market for them was also web-scale. That is, you only got to use resource intensive technologies like full web spidering or massive file caching if you were building a mainstream service with a potential audience of 500+ million daily users. This meant the basics: search, ads, maybe games.

But, when doing these things only costs a couple hundred bucks a month, a great many smaller markets suddenly become lucrative. Could you build value on top of a dynamically updated list of every mention made of every stock ticker symbol anywhere on the web? How about every mention of every trademark? Or every mention of every mp3? Since the overhead for extracting that value no longer includes building and maintaining enormous data centers it might actually become feasible to build service with such requirements.

(As a side note, it’s worth noting that one of the corollary effects of such a change is that it’s no longer necessary to take tens of millions of dollars in venture funding in order to run a business that requires web-scale computing. This means even more leverage for startups when negotiating for what little funding they do need and even shakier times ahead for the VCs out there looking to invest their billions in only a small handful of huge deals.)

And what about openness? What new prospects for collaborative networked volunteer-driven world-improving projects might we see?

How about a Web OS that runs on top of a peer-to-peer network of commodity machines that’s available to anyone who contributes some spare cycles to the cause — like a Google-scale Linux install running on top of SETI-at-home? Or what about a world-wide effort to federate the tracking of all manufactured objects via their RFID tags in order to maximize the efficiency of their recycling, discover any of their toxic effects, and rollback global warming?

These ideas may seem silly or grandiose, but so did Google when Larry and Sergei were still students or Linux when it was just an excuse for mailing list flame wars. This is one of those times. A lot of new things just became possible.

Tagged: , , , , ,

This entry was posted in useful web. Bookmark the permalink.

0 Responses to What Happens When Web-Scale Computing Becomes a Commodity?

  1. Thomas says:

    I’m sorry, but I have to call bullsh*t on this. The mere existence of these services is far from breaking down the barrier necessary for *anyone* to scale to amazon/ebay/google size services. There are at least, as I see it, 2 things still missing that won’t be “commoditized.”
    First, is the know how. Building a scalable application (really, application infrastructure) is not the same as building a basic app. And if anything, the general trends have shown that there is a great wide void of knowledge in the market about how to do this successfully.
    Second, is the management of this architecture. Ask any company running a large number of EC2 instances how they are managing it. Most likely, they are using a number of custom built tools. If they’re lucky, they managed to shoehorn some of the existing systems management tools into doing what they need. Either way, there’s a lot that goes into making this work.
    Now, I’m not going to try and say that widespread availability of these services isn’t going to change things drastically, but it’s certainly not going to suddenly open a flood gate for massive, google-sized applications.

  2. Thomas,
    First up, I’m not talking about “anyone”. I’m not picturing the Yahoo Pipes audience, but your average competent, intelligent web developer who does more than just Drupal customization and e-commerce integration.
    Second, I’m not talking about building apps with eBay- and Google-sized audiences. Maybe I should have been more clear about that. I’m talking about applications that need access to that scale of computing — the ability to grep for some content across a serious majority of all web pages or store a gazillion records in some db storage system — as one-off actions in order to provide some service that has value to a normal scale audience.
    Finally, while a lot of your criticisms of the feasibility of operating at this scale are true today, they won’t remain so. How long do you think it will be before existing frameworks and communities build tools and best practices that abstract a lot of these problems? Or for new ones to sprout up that are native to this computing environment but still easy to use for our target smart single developer? One of the great things about software is that not everyone has to be as smart as Linus Torvalds or David Hannemeier Hansson for a large number of people to be able to build valuable things on top of the systems they create.

  3. Marcus says:

    Thomas, it’s now easier to launch a web application on a dedicated box than it was 3 years ago and this trend will only continue. Scaling expertise is itself becoming a commodity and it will continue to become even more so.
    And yeah, cloud computing will help. Greg, are you two considering using Simple.DB? I dunno, getting results in XML? I think it looks messy. Then again, frameworks will abstract the pain away (at the cost of performance).
    But as speed gets cheaper, abstraction gets cheaper and more prevalent.

  4. Good points, Marcus.
    Yeah, we’re looking at SimpleDB, though more in the six month timescale than for the immediate future. We already slosh around a lot of data and we’re about to increase that amount by a few orders of magnitude. And we’re beginning to run into some feasibility limits when trying to make big improvements to our data model that involve munging a lot of data. We think something like SimpleDB might be able to ease the pain and cost somewhat, but it’s gonna take major planning for us to get there.
    It’s worth noting that we’re not in this situation because we have 500+ million users, but because our way of providing value is just super data intensive (something like looking for a needle in a haystack). Maybe I should have defined web-scale above in terms of moving around millions records (and their corresponding resources) than having millions of users…

  5. Randy Bias says:

    Greg, I shared some thoughts on your great post in a recent article in my blog here.
    Regards,
    –Randy

  6. Thomas says:

    Sorry, just now remembered to come back and read the response to my comments. Guess they sounded a bit inflammatory. Wasn’t meant to sound that way. Sorry! ;~)
    Anyway, I understand your point, Greg, that you didn’t really mean to say it was meant for “anyone,” but I think my initial reaction was that your description did seem to imply that the day is near when “anyone” could scale out like that.
    I do have to disagree somewhat with Marcus’ point about scaling knowledge becoming a “commodity.” Sure, things might be moving in that direction, but I think it’s a long ways off still.
    My reaction, though, really does hinge on the issue of managing this kind of scale. I remember clearly when IBM started talking about autonomous systems management back in the 90s. A lot of promises have been made over and over in that area and very little has happened. Having spent a good portion of my career in that market and been involved at least indirectly with a number of the open source efforts, I think it’s still a long way off from being ready to take advantage of this kind of scaling. That said, I really hope I’m wrong.
    Either way, I do get the excitement that all this brings and in that sense I agree with your sentiments. Things ARE changing and quickly. I’ve been preaching about the wonders of EC2 and similar efforts since I first read about them and will continue to do so until I see the being accepted as a standard way of doing business.

  7. Thomas,
    I totally agree with you about how much expertise is required to manage this kind of scale and that’s one of the major reasons I’m excited about Amazon and — one has to imagine — other companies going into this field: they manage a lot of the complexity for you. I haven’t actually used simpleDB yet, but in all of my s3 and ec2 experience, the services pretty much just work. All I think about is the surface area of their APIs and how those relate to my application. Part of that is because those services have beautiful ruby wrappings (for s3) and great pre-assembled Ruby on Rails-ready images (for ec2) that are a joy to work with — a great example of the open source abstraction and frameworks that I was talking about in regards to this kind of thing.
    It’s not so much that the knowledge becomes a commodity as that it gets embodied in working software that is available to any serious developer (my definition of anyone :).

Leave a Reply

Your email address will not be published. Required fields are marked *