I bet you over-engineered your startup

COBOL Rube Goldberg machine Quick! How do you design distributed scalable software that does many small things to achieve a common goal?

Most of you shouted "discrete services!" at the screen, others kind of shook your heads at the buzzword soup in that line. And that's okay too.

Perfect world

Everyone can tell you that building a big web project, one that's going to handle millions of users, withstand the force of a thousand monkeys with typewriters developers and look damn sexy doing it requires a cacophony of small (but efficient) services.

When your product is but a mashup of internal API's beautiful things happen

  1. Every service only does one little thing and does it well
  2. Developers own their piece of the puzzle
  3. Developers don't step on each other's toes
  4. When something breaks, your product only has a wonky feature rather than being completely shut down
  5. Your product is offensively simple to scale

Wonderful, isn't it?

Every sexy new startup, a hot piece of R&D or just a rewrite of the clunky old codebase, will look like this. It's the bleeding edge in industry best practices (and has been in one form or another ever since we discovered data-code separation). This tickles our engineering innards.

Your new system must be designed around rabbitMQ and Redis and mongoDB and Cassandra and node.js and haskell and scalable this and scalable that!

Use the best tool for the job! Every job. What could possibly go wrong?

The real world

Over here in the real world, I'm a user. I don't see your beautiful architecture and how easy it is to scale just by spinning up a few more dynos on Heroku.

All I see is that I need to sign out of all my gmail accounts so I can sign into the correct google analytics account. That it takes me 5 attempts to figure out which google account has got a youtube account attached and that the new direct messages notification doesn't even sync between tabs, let alone computers.

I also notice that sometimes "Saving seems to be taking longer than usual" and that sometimes you "Did something wrong, try to reload?", but reloading would lose what I tried to do and I can no longer copy-paste because you hid the interface.

Rube Goldberg challenge

What happens

Last week I prowled Zemanta's hackday for cool stories to write on this blog. The most passionate was @andraz's rant on "We have got to simplify our architecture, man! After years of swearing by small services, I am falling in love with monolithic architectures again" (paraphrased from memory)

If your system is well designed according to industry's best practices there will come a time when most of your resources are spent on getting things talking to one another.

Developer A makes a cool service. The whole system starts using it for whatever.

Developer B wants to add a feature, but hey presto, this or that piece of data isn't in the right database. Because, of course, there are many. Now what?

Bug Developer A to add an API call to their service.

And now Developer B is stuck.

After a while a new API call is created. But now different services need to know when data is changed in ten other services they depend on - you set up a notification system.

Now services spend the majority of their time chit-chatting about changes. Either waiting for an ACK or being interrupted by "Hey, hey you. Pssst. You, yes you! Hey I changed this bit of data. Thought you might want to know. Okay I'll stop interrupting your critical user-facing task now."

You set up a message queue to deal with all of that and the architecture just grows and grows and grows until it becomes an unwieldy behemoth. All you wanted were simple independent services doing Just One Thing (tm).

It can get even weirder with user accounts. In the beginnings of Zemanta it seemed only obvious to have three different systems for bloggers, advertisers and API developers. Makes sense right? Three completely different user groups with absolutely no overlap.

Until five years later you start thinking that maybe bloggers should be able to advertise too and that sometimes an advertiser might want to write a blog or twenty. And your more advanced advertisers would like access to the API, so it makes sense to put them in the API system because you already have rate limiting and all the infrastructure ...

It gets messy.

So what can you do ...

The solution is as simple as it is elegant - just simplify your architecture!

In the old days developers discovered a perfectly designed relational database becomes very slow - you can quicken it up by making a database schema that would make your college professor cry.

You can take the same idea and smash it against your architecture like a hammer. Start merging services until you come to the bare essentials that have to be discrete.

At the very least, merge the whole database into a single entity. There's no shame in having a "database service" that can handle all sorts of requests ... oh wait, most modern DB's already act a bit like that. Maybe there's just no need to have five different database engines?

Ok ok, maybe you can't completely merge the database, then at least have a single set of users.

Although how you intend to do The Machine Learning by having your data in ten databases that aren't speaking is beyond me. And I know you're doing machine learning because this is 2012 and everyone is doing that.

De Havilland comet model

You should follow me on twitter here.

Enhanced by Zemanta