Normal Forms in Database Design
Normal forms in database design are a series of guidelines (primarily 1NF, 2NF, 3NF, and higher like BCNF) for organizing relational databases to minimize data redundancy and dependency. They achieve this by progressively decomposing tables based on functional dependencies, starting with ensuring atomic values and unique rows (1NF), eliminating partial dependencies (2NF), and removing transitive dependencies (3NF). These forms are important because they prevent insertion, update, and deletion anomalies—issues that cause inconsistencies or data loss during modifications—while improving data integrity, reducing storage waste from duplicates, and making the database easier to maintain and query efficiently. In practice, achieving at least 3NF is standard for most applications to ensure reliability without overcomplicating the schema.
First Normal Form (1NF):
- Attribute values must be atomic. That is, for example, there should be no lists. Thus, an attribute like CustomerEmails in the Customer entity, which might have the value “ivan@gmail.com;ivan-super-coder@yahoo.com”, violates the first normal form. The solution is to create a separate table CustomerEmail with fields CustomerID and Email. Composite values should also be avoided if there is a potential need to work with parts of those values individually. For example, if the Customer entity has an attribute FullName (full name), and there might be a need to work only with the last name (e.g., extracting the last name from the full name in the program and displaying only it, or searching by last name in the database), then the FullName attribute violates the first normal form.
- All instances of the entity must have the same number of attributes and their values. A real-world example (violating 1NF) commonly encountered in practice: The Order entity has attributes Supplier1, Supplier2, Supplier3 to indicate the suppliers to whom the order may be sent. That is, the customer's order is first sent to the primary supplier and their ID is recorded in the Supplier1 field; if that supplier cannot fulfill the order, it is sent to another supplier and their ID is recorded in Supplier2, and so on. Accordingly, some rows in the table have only Supplier1 filled, others have Supplier1 and Supplier2, and yet others have all three. This violates the first normal form. The solution is to create a separate table SupplierOrder with fields CustomerOrderID, SupplierID, and other attributes describing the order to the supplier.
- All instances of the entity must be distinct. This condition requires the presence of a key (primary or unique) that allows differentiating between entities, and no entities with identical key values should be stored. Technically, this can be achieved by adding a surrogate key (such as an auto-increment counter or GUID), but in that case, the table may still contain rows where all other attribute values are identical and actually refer to the same real-world entity (e.g., two rows with different CustomerID but describing the same actual customer) — this is incorrect. In such cases, you can either add another unique key (e.g., based on the customer's passport number), or use a natural primary key altogether (i.e., one based on natural attributes rather than automatically generated ones) instead of a surrogate primary key.
Second Normal Form (2NF):
- The entity must be in first normal form.
- Every non-key attribute of the entity must depend on the entire primary key (and not just on one part of the primary key). For example, consider a table BookAuthor that stores information common to both an author and a book (e.g., the percentage of sales revenue the publisher will pay the author for a specific book). The primary key is composite and consists of the attributes AuthorID and BookISBN. Non-key attributes: AuthorPercent, BookTitle, AuthorFirstName, AuthorLastName. The AuthorPercent attribute depends on the entire primary key — both the author's ID and the book's ISBN, not just one or the other — so it is fine. However, the BookTitle attribute depends only on a PART of the primary key, namely the book (i.e., only on the BookISBN attribute). Therefore, the BookTitle attribute violates the second normal form. Similarly, AuthorFirstName and AuthorLastName violate 2NF. The solution is either to remove the attributes BookTitle, AuthorFirstName, and AuthorLastName from the BookAuthor table (if they already exist in the corresponding books and authors tables), or to move them to the appropriate books and authors tables (if, for some strange reason, they are not there yet).
Third Normal Form (3NF):
- The entity must be in second normal form.
- Non-key attributes must not depend on other non-key attributes (i.e., all non-key attributes must depend only on the key). For example, consider a table Book where the primary key is BookISBN, and there are also attributes Title, Price, PublisherID, PublisherCity. The presence of the PublisherCity attribute violates the third normal form because this attribute actually depends on the non-key attribute PublisherID, not on BookISBN.
Comments
Post a Comment