Picture yourself in your desk: you're starting a new project from scratch. A blank page where the programming possibilities are limitless and the best. is: you can design your own database.
Designing a database can be a little confusing in the beginning; especially, if you are new to it. But don't worry, I'll help you a bit with the basics to get started. ✨ 🙌🏻 ✨
Grab your coffee. ☕️
Having a well-designed database allows us to access information more efficiently, accurately and helps us avoid data redundancy (you know, duplicate information and data that is not optimized at all).
Therefore, it is important to follow these basic steps:
- Define the purpose of your database
- Identify your data
- Organize your data in tables
- Organize the tables by columns
- Define the primary keys and relationships between your tables
- Normalize your database and repeat
Let's go step by step
Define the purpose of your database
This means: what will my database do for me? For example:
- Keep a record of students in a school
- Register the participants of a contest
- Manage the information of employees of a company
There may be N reasons to have a database, but it is necessary that you understand what the purpose of your database is. In this way you will know how to identify the information you need more easily.
Identify your data 🤓
Identify what data you need to save in the database.
You can write everything down as a list. It does not have to be perfect (it is a kind of draft) because the objective is to extract the data that you need to be saved and gradually improve details each time the documentation is read or the project is discussed.
Organize the information in tables ✍🏻
Once you have gathered the information and made your draft, divide that list into entities, that is, objects, people, locations, etc. Simply put, an entity is "something" in the real world that contains attributes or characteristics.
In a database, each entity is a table. For example, suppose we have the following list from an online store project (the following is a very small list and is only to exemplify entity segmentation):
- Cell phones
- Personal data of the client
- Client email
- Client password
- Orders sent home
- Inventory of products
From that list we can then identify large groups of entities such as:
- Cell phones
- Personal data of the client
- Home delivery
Organize the tables by columns 👩🏻🏫
Since we have the segmentation of the entities, now it's time to define the attributes or characteristics as well as the type of data that make up each entity or in this case each table.
Based on the previous list (we will only take two entities for this example):
- Product name
- Product description
- Unit cost
Since we have the columns of each table identified, we can review and polish our columns again.
(Why is phone a string? 'Cause might need to indicate the country code before the phone number)
Define primary keys and relationships between tables 👯♀️
Each table must include a column that identifies that each record is unique. It is usually a 1 to N number field ... and should never be empty. We call this the primary key (PK).
In the table we have two records: Meredith and Bernie.
Each record is unique so we define a field called id where we record a unique value for both Meredith and Bernie; 1 and 2 respectively.
Why is the name not our primary key? This is because names are not unique data. Take it into account! :)
Now, to relate our tables to each other we will use foreign keys (FK). It is a column that basically refers to an external table.
Let's see an example where we relate our table of Customers with our table of Products and the table of Orders.
A minimum order has the following attributes:
- Customers who bought
- Products bought
Then our Orders table would be like this (taking into account primary and foreign keys):
- client_id: is the foreign key that refers to the id field that is the primary key of the client table
- product_id: is the foreign key that refers to the primary key of the product table
Therefore, if we see the customers and products tables:
We can say that: -The purchase with the ID: 1 was made by Meredith and she bought a laptop. -The purchase with the ID: 2, was made by Meredith too and she bought an android phone
- The purchase with the ID: 3, was made by Bernie and he bought an iPhone
In our diagram we see the above as follows:
This is how we relate our tables, using foreign keys.
Normalize your database and repeat 👯♂️
Once the first version of our database is finished, analyze:
- Is there a missing attribute in any table?
- Is there repeated information?
- Do you need an additional table?
- Is the data atomic?
As we make a first version of the database we will surely find improvements to be made. The previous questions are an example of how we can analyze and refine our design.
If you're experienced with database design, you may have noticed our Customers table needs to be normalized. Normalization allows us to identify if our tables are structured properly, so there are 3 levels of normalization (there are more but the first three rules are the most used in database design)
First normal form 🕺🏻
- Independent data in each column
- A column must not have multiple data
- There is no specific order in the rows, that is, there are no dependencies if a record is discharged before another.
- No duplicate data
- No groups of columns with the same type of repetitive data, such as address_1, address_2, address_3, etc. Ideally, make an additional table for this.
- Have a primary key
Non-normalized Customer table
We see that
- First name is a field with multiple values, for example, it has first name, middle name and last name. We can make this field atomic.
- Also name is a duplicate value since Meredith has two different addresses and phone numbers. We must avoid duplication.
So, this is our clients table now:
Second normal form 💃🏻
The rule here is to create additional tables for the repeating data and establish a relationship using foreign keys.
In the previous example, we removed the address and telephone information from the Customers table since it generated duplicate data in the user's name. So having a specific table for addresses is ideal.
A client can have several addresses and phone numbers.
By relating we can create a table of Customers and Addresses:
- Meredith with customer ID 1 has the address with ID 1 and the address with ID 3 (because she is a customer with multiple addresses)
- Bernie with Customer ID 2 has the address with ID 2
Third normal form 🕴🏻
In this rule it tells us to remove the fields that do not depend on the primary key. That said, we can say that the Clients table can be simplified even more, removing the access information and leaving only basic client details and other access data.
But although it would be ideal to continue with a normalization in its third phase, it is also possible that too many tables affect the performance of the database. It is suggested to apply it only to data that changes frequently.
Our client table, as it currently, is functional and optimized.
What other example of normalization would you do? Do you normalize your tables automatically yet?
What tools do you use to design your database? I use pencil and paper for personal use and when I have to collaborate I use applications such as: Draw.io
I hope this post was helpful and allows you to explore more about databases.
See you in the next SQL 😉