Anish Sneh - Open Source: Cassandra

In the previous post we learnt about the basics of Cassandra and CAP theorem, in this post we will have a closer look at Cassandra data model and working of Cassandra.

Data Model

Cassandra is can be defined as a hybrid between a key-value and a column-oriented database. In Cassandra world the a data model can be seen as a map which is distributed across the cluster. In other words a table in Cassandra is a distributed multi-dimensional map indexed by a key.

Cassandra Data Model

Let have a quick look on the key terms again:

Column

A column can be defined as the smallest container in Cassandra world with following properties:

Name
Value
Timestamp

Super Column

A super column can be defined as a tuple with:

Name
Value

Here value maps it to many columns
There is no timestamp like normal column.

Column Family

Column family can be defined as a collection of rows and columns.
A column family can be compared to a table in a traditional relational database.
Each row in a column family is uniquely identified by a row key i.e. each key identifies a row of a variable number of elements.
Each row can have multiple columns, each of which has a name, value and a timestamp.
Different rows in the same column family need not to share the same set of columns (unlike a table in a traditional relational database)
A column may be added to one or multiple rows at any time without affecting the the complete dataset.
In a column family there can be billion of columns.

Column Family

Super Column Family

A super column family is a NoSQL object that contains multiple column families.
It is a tuple (pair) that consists of a key-value pair, where the key is mapped to value which are column families.
In traditional relational database systems analogy it can be seen as something like a "view" on more than one tables.

Super Column Family

Keyspace

A keyspace contains the column families just like a database contains tables in relational world, they are used to group column families together.
In traditional relational database analogy keyspaces can be seen as database schema.
It is the outer most grouping of the data in the data store.
Generally in a cluster there is one keyspace per application.
The keyspace may include meta information such as replication factor and data center awareness.
There is a default keyspace provided for Cassandra internals named system

Replication

The term replication means how many copies of each piece of data we need in our cluster. It is the process of storing copies of data on multiple nodes to ensure reliability and fault tolerance.

Cassandra stores multiple copies of data known as replicas. We may set the number of replicas while creating a keyspace. Placement of these replicas in the cluster are determined by replica placement strategy.

Replication Strategy

It determines which nodes hold replicas of a row i.e. SimpleStrategy picks the next N-1 nodes clockwise around the ring from the node that stores the row. Cassandra also provides other strategies that take into account multiple data centers, both local and geographically dispersed.
It sets the distribution of the replicas across the nodes in the cluster depending on the cluster's topology.
Replication strategy is usually defined at the time of keyspace creation.
Basically, there are two replication strategies are available:

SimpleStrategy

Simple strategy should be used for a single data center only. In case of more than one datacenter SimpleStrategy must not be used.
It places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology i.e. rack or data center location.

NetworkTopologyStrategy

It is the recommended strategy for most deployments because it generalises the strategy from one datacenter to N, since it is much easier to expand to multiple data centers when required by future expansion.
NetworkTopologyStrategy should be used when we have Cassandra cluster deployed across multiple data centers. This strategy specify how many replicas we want in each data center.
It places replicas in the same data center by walking the ring clockwise until it reaches the first node in another rack.
NetworkTopologyStrategy tries to place replicas on distinct racks because nodes in the same rack (or similar physical grouping) might fail at the same time due to power, network or other hardware issues.

Replication Factor

The total number of replicas across the cluster are referred as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node (node will be determined by replication strategy). Note that all replicas are equally important for failover; there is no primary or master replica (remember there is no master in Cassandra cluster).
Logically, the replication factor should not exceed the number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of nodes afterwards. When replication factor exceeds the number of nodes, writes are rejected, but reads are served as long as the desired consistency level can be met.

Partitioner

The partitioner controls how the data is distributed over your nodes. In order to find a set of keys, Cassandra must know what nodes have the range of values client is looking for.
A partitioner is a hash function for calculating the token or hash of a row key to replicate the data in cluster. Each row of data is uniquely identified by a row key and distributed across the cluster by the value of the token.
The token generated by the partioner is further used by replication strategy to place the replica in the cluster.
Cassandra offers following three partitioner out of the box:

Murmur3Partitioner (default): uniformly distributes data across the cluster based on MurmurHash hash values.
RandomPartitioner: uniformly distributes data across the cluster based on MD5 hash values.
ByteOrderedPartitioner: keeps an ordered distribution of data lexically by key bytes.

Partitioner are configured in the cassandra.yaml file as follows:

Murmur3Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
RandomPartitioner: org.apache.cassandra.dht.RandomPartitioner
ByteOrderedPartitioner: org.apache.cassandra.dht.ByteOrderedPartitioner

Consistency Level

Consistency level in Cassandra can seen as how many replicas must response to declare a successful read or write operation.
Consistency refers to how up-to-date and synchronized a row of Cassandra data is on all of its replicas.
Since Cassandra extends the concept of eventual consistency by offering tunable consistency, hence for any given read or write operation, the client application decides how consistent the requested data must be.

We will have see Cassandra's architecture highlights and read/write internals in the next post.

Useful links:

http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/#.VRHe-s2sXrd
http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency

Pages

Thursday, March 26, 2015

Cassandra | Quick Dive

Data Model

Replication