Normalization:
Database normalization is a process by which an existing schema is modified to bring its component tables into compliance with a series of progressive normal forms.
The goal of database normalization is to ensure that every non-key column in every table is directly dependent on the key, the whole key and nothing but the key and with this goal come benefits in the form of reduced redundancies, fewer anomalies, and improved efficiencies (While normalization tends to increase the duplication of data, it does not introduce redundancy, which is unnecessary duplication.).
About the Key
Column(s) C is primary key for table T if:
Property 1: All columns in T are functionally dependent on C
Property 2: No subcollection of columns in C (assuming C is a collection of
columns and not just a single column) also has Property 1
Candidate Keys -
Column(s) on which all other columns in table are functionally dependent
Alternate Keys -
Candidate keys not chosen as primary keys
First Normal Form
The first normal form (or 1NF) requires that the values in each column of a table are atomic. By atomic we mean that there are no sets of values within a column.
Second Normal Form
Where the First Normal Form deals with atomicity of data, the Second Normal Form (or 2NF) deals with relationships between composite key columns and non-key columns. The normal forms are progressive, so to achieve Second Normal Form, your tables must already be in First Normal Form.
The second normal form (or 2NF) any non-key columns must depend on the entire primary key. In the case of a composite primary key, this means that a non-key column cannot depend on only part of the composite key.
Third Normal Form
Third Normal Form (3NF) requires that all columns depend directly on the primary key. Tables violate the Third Normal Form when one column depends on another column, which in turn depends on the primary key (a transitive dependency).
One way to identify transitive dependencies is to look at your table and see if any columns would require updating if another column in the table was updated. If such a column exists, it probably violates 3NF.
Boyce-Codd Normal Form (BCNF)
When a relation has more than one candidate key, anomalies may result even though the relation is in 3NF. 3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys i.e. composite candidate keys with at least one attribute in common.
BCNF is based on the concept of a determinant. A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent. A relation is in BCNF is, and only if, every determinant is a candidate key.
Consider the following relation and determinants.
R(a,b,c,d)
a,c -> b,d
a,d -> b
To be in BCNF, all valid determinants must be a candidate key. In the relation R, a,c->b,d is the determinate used, so the first determinate is fine.
a,d->b suggests that a,d can be the primary key, which would determine b. However this would not determine c. This is not a candidate key, and thus R is not in BCNF.
Example
• Unnormalised
Grade_report(StudNo,StudName,(Major,Advisor(CourseNo,Ctitle,InstrucName,InstructLocn,Grade)))
• 1NF Remove repeating groups
Student(StudNo,StudName)
StudMajor(StudNo,Major,Advisor)
StudCourse(StudNo,Major,CourseNo,
Ctitle,InstrucName,InstructLocn,Grade)
• 2NF Remove partial key dependencies
Student(StudNo,StudName)
StudMajor(StudNo,Major,Advisor)
StudCourse(StudNo,Major,CourseNo,Grade)
Course(CourseNo,Ctitle,InstrucName,InstructLocn)
• 3NF Remove transitive dependencies
Student(StudNo,StudName)
StudMajor(StudNo,Major,Advisor)
StudCourse(StudNo,Major,CourseNo,Grade)
Course(CourseNo,Ctitle,InstrucName)
Instructor(InstructName,InstructLocn)
• BCNF Every determinant is a candidate key
– Student : only determinant is StudNo
– StudCourse: only determinant is StudNo,Major
– Course: only determinant is CourseNo
– Instructor: only determinant is InstrucName
– StudMajor: the determinants are
StudNo,Major, or
Advisor
Only StudNo,Major is a candidate key.
• BCNF
Student(StudNo,StudName)
StudCourse(StudNo,Major,CourseNo,Grade)
Course(CourseNo,Ctitle,InstrucName)
Instructor(InstructName,InstructLocn)
StudMajor(StudNo,Advisor)
Adviser(Adviser,Major)
A complete normalization of tables is desirable, but you may find that in practice that full normalization can introduce complexity to your design and application. More tables often means more JOIN operations, and in most database management systems (DBMSs) such JOIN operations can be costly, leading to decreased performance. The key lies in finding a balance where the first three normal forms are generally met without creating an exceedingly complicated schema.