Database Partitioning and CAP theorem

As the database grows in size, maintaining acceptable performance becomes more complex. Partitioning is one method used to manage large and busy databases.

So what is database partitioning?

Simply put in layman terms, it is splitting data across multiple database servers

  • It is done by a consistent method so you always know where the data is
  • There can be several different methods of partitioning, for instance:
    • ranges A-L, M-Q, R-Z.
    • lists: building a database of books can be categorised in textbooks, sci-fi, cookbooks
    • hashes: function returning a value to determine membership

Why partition databases?

  • Storage limitations: entire dataset may not fit on one server
  • Performance: splitting loads between partitions can make faster writes
  • Availability: ensures you can get the data even if one server gets busy

When not to partition?

  • when your dataset is small

Partitioning in relational databases vs NoSQL

  • Relational databases can be partitioned horizontally or vertically
  • Horizontal partitioning (also known as sharding) puts different rows on different partitions
  • Vertical partitions puts different columns on different partitions

Partitioning non relational databases depend on the database type

  • Key-value and document databases are typically partitioned horizontally
  • Tabular databases can be horizontally and vertically partitioned
Understanding how partitioning works is key to making informed decisions about database performance.

What is CAP theorem?

Finding the ideal database for your application is largely a choice between trade-offs.
The CAP theorem is one concept that can help you understand the trade-offs between different databases.
  • originally conceptualised around network shared data
  • often used to generalize tradeoffs between different databases

CAP theorem states that it is impossible for a distributed software system to simultaneously provide more than two out of three of the following guarantees (CAP): Consistency, Availability, and Partition tolerance. When we design a distributed system, trading off among CAP is almost the first thing we want to consider. CAP theorem says while designing a distributed system we can pick only two of the following three options:

Consistency: All nodes see the same data at the same time. Consistency is achieved by updating several nodes before allowing further reads.

Availability: Every request gets a response on success/failure. Availability is achieved by replicating the data across different servers.

Partition tolerance: The system continues to work despite message loss or partial failure. A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data is sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.

visualization-of-cap-theorem

CAP theorem and SQL vs NoSQL

  • Relational databases tend towards consistency and availability. Partition tolerance is something that relational databases typically don’t handle very well.
  • NoSQL databases trend towards partition tolerance. They are designed with the idea in mind that you’re going to be adding more nodes to your database as it grows.

Does CAP matter to you?

In some instances CAP theorem may not apply to your application.
  • Depending on the size of your application, CAP tradeoffs may be irrelevant
  • small or low traffic website may not require partitions
  • consistency tradeoffs may not be usable in some cases. for instance: votes on a comment may not show right away for all users
CAP theorem can be used a guide for categorising the tradeoffs between different databases.
CAP theorem can help you prioritize on what properties you may want in your database.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s