Friday, August 9, 2013

Understanding Database table normalization

I am learning the fundamentals of how to "normalize" a database. Normalization means in the simplest sense - "remove data repeat, remove data constraints (or dependency) across tables when performing data operations (during create, update and delete operations on table).

The simplest form of Normalization is the First Normal Form (1NF)
The following rules must be followed to make a database table 1NF conformed.
  1. Each column must have a unique name.
  2. The ordering or rows and columns in the table is irrelevant.
  3. Every column must have a single data type - that is if a column holds string values, then it cannot hold integer values.
  4. Each row must hold unique values - no two rows can be the same.
  5. Each column must contain a single value.
  6. Columns cannot contain repeating groups. 
An example to illustrate point #6 would be an account holder who has multiple bank accounts
 CustID  Name         SSN-4       Bank Name1      Bank Name2       Bank Name3
  A123   Tom Brockaw  1234          BoFA          WellsFargo       Chase
 
Instead the bank data should be put in a seperate table 
Customer table
CustID  Name         SSN-4
A123   Tom Brockaw  1234
 
Bank table 
Cust ID Bank name
A123  BoFA
A123  WellsFargo
A123 Chase

Next step is to convert a database table to 2NF compliance. A table is considered 2NF when it satisfies
  1. 1NF compliant
  2. All of the non-key fields depend on all of the key elements.
To illustrate point #2,here is an example

NameLanguagesCurrent Home Location
SeanC++268 San Bruno
SeanJava268 San Bruno
SeanPerl268 San Bruno
JimC114 Wall Street
JimJava114 Wall Street
JimC#114 Wall Street

Here the key is Name + Languages but the "Current Home Location" is based on part of the key - Name and not on "Languages". Therefore the non-key field ("Current Home Location") can be taken out of the table and placed in a separate table.
Now the 2 NF table is
Name Languages
Sean C++
Sean Java
Sean Perl
Jim C
Jim Java
Jim C#

Name Current Home Location
Sean  268 San Bruno
Jim 114 Wall Street.


A 3NF complaint table is
  1. 2NF compliant
  2. It contains no transitive dependencies - in this case a non key is dependent on another non key.
Here is an example of a Employees table which is not 3 NF compliant with Employee as primary key. 

Employee                        Project                     Department
Alison                             Java                          Computer Science
Jim                                  C#                            Computer Science
John                                SIP                           Electronics & Telecommunications
Tim                                 CTI                           Electronics & Telecommunications
Robert                            Long Jump                Athletics
Sean                               Sprint                         Athletics
Tara                                Java                           Computer Science

In this case Department is dependent on the Project an employee takes and both Project and Department are non-keys. In order to remove the transitive relationship, we take out the dependent field (Department) into a seperate table and link the dependent field with the other non-key (Project)

Employee (PK)               Project
Alison                             Java                         
Jim                                  C#                           
John                                SIP                        
Tim                                 CTI                        
Robert                            Long Jump              
Sean                               Sprint                       
Tara                                Java

Project (PK)                           Department
Java                          Computer Science
C#                            Computer Science
SIP                           Electronics & Telecommunications
CTI                           Electronics & Telecommunications
Long Jump                Athletics
Sprint                         Athletics

Boyce Codd Normal Form - 3 NF + Every determinant is a candidate key ( fields that may be dependent on part of the primary key)
 Note: - A candidate key is a "minimal" superkey.
4NF complaint table - which is BCNF compliant + It does not contain an unrelated multi-valued dependency

5NF - 4NF + does not contain any related multi-valued dependency

DKNF (Domain/Key Normal Form) - the table contains no constraints except domain and key constraints


5NF - 4 NF + does not contain any related multi-valued dependency