Letting go of normalization – an introduction
May 21, 2008
The Google App Engine discussion list has been fairly active with people trying to rework/rehabilitate their relational database habits and knowledge to the strengths and shortcomings of App Engine’s Bigtable-based Datastore. For many, the pondering starts with “um, OK, how about JOINS?” That question keeps coming up, and there have been a few proposed substitutes.
If one even slightly “gets it” though, the next step is to try to figure out the new paradigm at work here. How do we need to be thinking about the design of our queries, our models, and our applications as a whole? There’s a dearth of documentation, successful code examples, and explicated best practices. The discussion list has been slowly filling up that hole. At the heart of much of the advice being gathered there is the need to denormalize data. All data should be accessed via models designed with a view towards the ways in which the data will actually be used. A little extra effort should be applied while storing data to optimize it to be ready for use on read, so as to avoid complicated joins and unnecessary hits to the datastore. And other such points counterintuitive to the relationally-trained mind.
Now Todd Hoff has written the best overview of the issues I’ve seen to date, titled “How I Learned to Stop Worrying and Love Using a Lot of Disk Space to Scale.” This is definitely recommended reading for anyone starting out with App Engine, required reading if you’re thinking of porting an app (say one based on Django with a MySQL back-end), and remedial reading if you built (or started to build) an app where you tried to emulate what works in relational databases, and found something off.