Data Normalization – Is It Still Relevant In A World of NOSQL?

Data ModelSome years ago, I wrote an article for this magazine entitled “SQL by Design: Why You Need Database Normalization”. At that time, NoSQL databases were a tiny percentage of the installations, and relational ruled the land.

Today the landscape has changed, and we have relational database management systems like SQL Server sharing the spotlight with variations of the NoSQL technology. So, I thought we could revisit the original reasons for normalizing data and see if they’re still relevant.

Assumptions: when you normalize a database, you’re targeting the following goals:
– arranging data into logical groupings such that each group describes a small part of the whole;
– minimizing the amount of duplicate data stored in a database;
– organizing the data such that, when you modify it, you make the change in only one place;
– building a database in which you can access and manipulate the data quickly and efficiently without compromising the integrity of the data in storage.

The original article contended that “Data normalization is primarily important in the transactional, or online transactional processing (OLTP), database world, in which data modifications (e.g., inserts, updates, deletes) occur rapidly and randomly throughout the stored data.” It then went on to point out that data warehouses, which often contain volumes of denormalized and precalculated/summarized data, often ignore the rules of data normalization.

To the data warehouse model we can add many of the current installations of NoSQL databases — databases that are engineered to capture semi-structured data, from data sources over which quality we have little if any control, which data meaning can actually shift over time.  It seems to me that if we want to be able to integrate NoSQL data into our own highly transactional, well-defined, conventionally-normalized financial data, for instance, then yes — we need to understand the NoSQL data, its meaning, where it came from, its quality, its integrity. Well, if we want accurate answers, that is, we need to know.

Are you thinking that the NoSQL installation at your shop is a stand-alone system which is being mined as the data is gathered, and that nothing else will ever be done with it? Good luck with that idea! If IT history has shown us nothing else, it’s clear that data is being used and reused in ways that were never imagined when the collection systems were put in place.

If you’re working in a hybrid (SQL plus NoSQL) environment, I believe that the highly-structured SQL data is your stake in the ground; it’ll give you the basis on which to build additional knowledge and marketing expertise when integrated correctly with the NoSQL data that’s being collected in concert with it. And that’s why I think you still need to normalize your (SQL) data.

Got other ideas? Let’s hear them!


Read the original article “SQL by Design: Why You Need Database Normalization” at http://sqlmag.com/database-performance-tuning/sql-design-why-you-need-database-normalization