- Introduction
- Picking a mnemonic acronym
- Constraint-Driven
- Optimized
- Responsive
- Efficient
- CORE Database Design
- Thank You
Introduction
In this post, we’ll cover some database design principles and package them up into a catchy mnemonic acronym.
Software engineering is loaded with acronyms like this. For example, SOLID principles describe 5 principles, Single responsibility, Open-closed, Liskov substitution, Interface segregation and Dependency inversion, that promote good object-oriented design.
Databases are loaded with acronyms, for example “ACID” for the properties of a transaction, but I wasn’t familiar with one the schema designer could keep in mind while they’re working.
Thus, the motivation for this acronym was to help the schema designer, by packaging up some principles of good design practices for database schema design. It’s not based in research or academia though, so don’t take this too seriously. That said, I’d love your feedback!
Let’s get into it.
Picking a mnemonic acronym
In picking an acronym, I wanted it to be short and have each letter describe a word that’s useful, practical, and grounded in experience. I preferred a real word for memorability!
The result was “CORE.” Let’s explore each letter and the word behind it.
Constraint-Driven
The first word (technically two) is “constraint-driven.” Relational databases offer rigid structures, but the ability to be changed while online, a form of flexibility in their evolution. We evolve their structure through DDL. They use data types and constraints changes, as new entities and relationships are added.
Constraint-driven refers to leveraging all the constraint objects available, designing for our needs today, but also in a more general sense to apply restrictions to designs in the pursue of data consistency and quality.
Let’s look at some examples. Choose the appropriate data types, like a numeric data type and not a character data type when storing a number. Use NOT NULL
for columns by default. Create foreign key constraints for table relationships by default.
Validate expected data inputs using check constraints. For small databases, use integer
primary keys. If you get huge later, we can migrate the structure and data.
The mindset is to prefer rigidity, design for today, then leverage the ability to evolve flexibly later, as opposed to designing for a hypothetical future state.
Optimized
Databases present loads of optimization opportunities. Relational data is initially stored in a normalized form to eliminate duplication, but later denormalizations can be performed when read access is more important than eliminating duplication.
When our use cases are not known at the outset, plan to iterate on the design, changing the structure to better support the use cases that have become better known over time. This will mean evolving the schema design through DDL changes.
This applies to tables, columns, constraints, indexes, parameters, queries, and anything that can be optimized to better support real use cases.
Queries are restructured and indexes are added to reduce data access. Pursue highly selective access, a small proportion of rows, on high cardinality (uniqueness) data, to reduce latency.
Critical background processes like VACUUM get optimized too. Resources (workers, memory, parallelization) are increased to support use cases.
Responsive
When problems emerge like column or row level unexpected data, missing referential integrity, or query performance problems, engineers inspect logs, catalog statistics, and parameters, from the core engine and third party extensions, to diagnose issues.
When DDL changes are ready, the engineer applies them in a non-blocking way, in multiple steps as needed. Operations are performed “online” by default when practical.
DDL changes are in a source code file, reviewed, tracked, and a copy of the schema design is kept in sync across environments.
Parameter (GUC) tuning (Postgres: work_mem
, etc.) happens in a trackable way. Parameters are tuned online when possible, and scoped narrowly, to optimize their values for real queries and use cases.
Efficient
It’s costly to store data in the database! The data consumes space and accessing data unnecessarily adds latency.
Data that’s stored is queried later or it’s archived. There’s a continual process to consider use cases, and archive any unneeded data.
To minimize space consumption and latency, tables, columns, constraints, and indexes are removed continually by default, when they no longer are required, to reduce system complexity.
Server software is upgraded at least annually so that performance and security benefits can be leveraged.
Huge tables are split into smaller tables using table partitioning for more predictable administration.
CORE Database Design
There’s lots more to evolving a database schema design, but these principles are a few I keep in mind.
Did you notice anything missing? Do you have other feedback? Please contact me with your thoughts.
Thank You
Over the years, I’ve learned a lot from Postgres.fm hosts Nikolay and Michael, and other community leaders like Lukas and Franck, as they’ve shaped my database design choices.
I’m grateful to them for sharing their knowledge and experience with the community.
Thanks for reading!