ifacethoughts

Think Before You Normalize

Dare Obasanjo says do not normalize immutable data, and refers to Pat Helland’s presentation. Database Normalization is an activity in database design to reduce redundancy and eliminate update anomalies. In short, it minimizes possibility of loss of data integrity while updating data. It is one of the best practices for database design.

But of course, with any technical best practice, there is a cost involved. And why this tip is a highlight is because it asks us to consider cases in which that cost-benefit analysis can tilt on the other side. Usually a normalization ends up into multiple tables, and retrieval of data from all of them might involve multiple joins, which can be expensive. The benefits of normalization are during updation of data, which means that it does is not so beneficial in case of immutable data, or data that does not change. The question to ask is why should be bear the cost of normalization in such a case, without its benefit. A very valuable question to ask.

Fundamentally this is not a new idea. Denormalization has always followed normalization to tune for performance by adding redundant data. The reason being same, it can eliminate certain joins. However, it is still important to be able to identify that belongs to one entity, and separate out the association data. Some amount of normalization will always be involved there. I feel more comfortable with this idea by considering it to be the same as denormalization rather than not to normalize at all.

Having said that there is huge credit to the perspective that normalization is not beneficial in case of immutable data. Very useful.

Discussion [Participate or Link]

  1. mantrid said:

    Data normalization is a process used mainly to minimize data redundancy. Any redundancy delivers possible troubles in use-cases which were omitted in the design process or were erronously implemented. By introducing normalized structures to the db schema we leverage its quality and internal consistency.

    Normalization for dbs is like good frameworks for code. We can ofcourse resign of using a framework to speed up code but maintaining and development of such code is hard, error prone and expensive.

    Instead of just denormalization I would recommend using another aproach: data warehousing. Warehouses are “semi-denormalised” databases used for high performance insertion and search.

  2. Carlo Vella’s Bits and Bytes » Think before you normalise said:

    [...] of loss of data integrity while updating data. It is one of the best practices for database. More info here Bookmark [...]

  3. jon o said:

    Just a warning : Be careful about what data you consider immutable!

    Working on a lightweight account management system for a client, one of my first large-scale projects, I made the mistake of ass-uming that people’s names were immutable data. I mean, it’s not like someone’s name is going to change, right?

    Whoops! Within a few months I had to redo the whole database. People get married and change their last name, add professional titles like CPA, and request to go by Chuck not Charles.

    Just a warning - I totally agree with the concept. Thanks for the post!

  4. Abhijit Nadgouda said:

    mantrid, I fully agree with you that there is a tipping point after which data warehousing makes more sense.

    jon, good point there. Seeming immutable data is dangerous. There are some straightforward ones like financial data, but I agree with you that data model should be sure about the immutability.

  5. Daniel said:

    I have to say, that I could not agree with you in 100% regarding o.us poetry, but it’s just my opinion, which could be wrong :)

Say your thought!

Who are you?

If you want to use HTML you can use these tags: <a>, <em>, <strong>, <abbr>, <code>, <blockquote>. Closing the tags will be appreciated as this site uses valid XHTML.

freshthoughts

freshcomments

personalfavorites

contactme

Abhijit Nadgouda
iface Consulting
India
+91 9819820312
Y!: anadgouda
GTalk: anadgouda@gmail.com
MSN: anadgouda@hotmail.com
Skype: anadgouda

currentproject

Complete Wellbeing

thoughtfulthoughts

The most important ideas that emerge from basic science have never been protected by patents and never should be.
Joe Stiglitz

badgesand...

This is the weblog of Abhijit Nadgouda where he writes down his thoughts on software development and related topics. You are invited to subscribe to the feed to stay updated or check out more subscription options. Or you can choose to browse by one of the topics.

Twitter - IMHO, Epiphany is by far the best browser.