Cloud Native Architecture
We approached CrystalDB with a blank sheet of paper, asking what the right architecture for a cloud native OLTP database should be.
Core requirements
In when designing our architecture, we identified five core requirements for a cloud native database:
- Shared data architecture. Separating compute from storage makes flexible resource proportionality possible. Since compute, memory, and storage needs can vary independently, decoupling the design makes it possible to provision what is needed for each quickly—without migrating stored data. Our disaggregated storage subsystem, Crystal File System (CFS), is designed specifically to meet the needs of OLTP databases and fills a gap between traditional shared file systems (e.g., NFS) and object storage (e.g., S3).
- LSM indexing. The log-structured merge-tree (LSM tree) has become the standard foundation for scalable databases. Its underlying files are written only once, in an append-only fashion, allowing them to be cached and replicated easily. LSM trees are naturally suited to partitioning and repartitioning, and they can be tuned to a wide variety of workloads. They are well suited to SSD storage, which is ubiquitous today.
- Distributed transaction avoidance. The fastest way to run a transaction in a distributed database is to find a way to run it all in one place, i.e., to run it as a local transaction. For strong consistency models, physics dictates this. However, not all distributed transactions are equally problematic. By analyzing a workload, we identify those distributed transactions that cause contention and find the partitioning that minimizes them.
- Multimodal concurrency control. Distributed databases and centralized databases have evolved to use different concurrency control protocols. For example, two-phase locking (2PL), a well-understood protocol that is proven in distributed databases, performs less well on a single node than snapshot isolation (SI) or serializable snapshot isolation (SSI), the protocols used by PostgreSQL. Multimodal concurrency control supports both distributed transactions and local transactions and preserves the performance of local transactions, even when distributed transactions enter the mix.
- Distributed query processing. A lightweight routing layer sends single-node transactions directly to the servers responsible for processing them. A distributed query processor extends PostgreSQL’s query planner and two-phase locking (2PL) mechanism to create first-class support for distributed transactions.
Additional background and motivation for the architecture can be found in the CrystalDB Intro Brief (7 pages).