Thursday, March 26, 2015

Cassandra | Quick Dive

In the previous post we learnt about the basics of Cassandra and CAP theorem, in this post we will have a closer look at Cassandra data model and working of Cassandra.

Data Model

Cassandra is can be defined as a hybrid between a key-value and a column-oriented database. In Cassandra world the a data model can be seen as a map which is distributed across the cluster. In other words a table in Cassandra is a distributed multi-dimensional map indexed by a key.

Cassandra Data Model

Tuesday, March 24, 2015

Cassandra | Internals

In the previous post we learnt about Cassandra data model and replication concepts, in this post we will look the Cassandra architecture and read/write internals.

Architecture | Highlights

  • Cassandra was designed after considering all the system/hardware failures that do occur in real world.
  • Peer-to-peer, distributed system in which all nodes are alike hence reults in read/write anywhere design.
  • Data is transparently partitioned among all nodes in the cluster.
  • Custom data replication is provided out of the box to ensure fault tolerance.
  • In Cassandra cluster each node communicates with other through the GOSSIP protocol, which exchanges information across the cluster every second.
  • A commit log is used on each node to capture write activity. Data durability is assured.
  • At the same time data also written to an in-memory structure (memtable) and then to disk once the memory structure is full (an SStable).
  • A row in a column family is indexed by its key. Other columns may be indexed as well, we need indexes to quickly search from cassandra. Note that in Cassandra indexes are virtually another tables.
  • Consistency can be choosen between strong and eventual (from all to any node responding) depending on the need. It can be done on a per-request basis, and for both reads and writes.
  • Provides data compression out of the box. It uses Google's Snappy data compression algorithm, compresses data on a per column family level. There are not known performance penalty in compression.