In the previous post we learnt about the basics of Cassandra and CAP theorem, in this post we will have a closer look at Cassandra data model and working of Cassandra.
Let have a quick look on the key terms again:
Data Model
Cassandra is can be defined as a hybrid between a key-value and a column-oriented database. In Cassandra world the a data model can be seen as a map which is distributed across the cluster. In other words a table in Cassandra is a distributed multi-dimensional map indexed by a key.![]() |
Cassandra Data Model |
Let have a quick look on the key terms again:
- Column
- A column can be defined as the smallest container in Cassandra world with following properties:
- Name
- Value
- Timestamp
- Super Column
- A super column can be defined as a tuple with:
- Name
- Value
- Here value maps it to many columns
- There is no timestamp like normal column.
- Column Family
- Column family can be defined as a collection of rows and columns.
- A column family can be compared to a table in a traditional relational database.
- Each row in a column family is uniquely identified by a row key i.e. each key identifies a row of a variable number of elements.
- Each row can have multiple columns, each of which has a name, value and a timestamp.
- Different rows in the same column family need not to share the same set of columns (unlike a table in a traditional relational database)
- A column may be added to one or multiple rows at any time without affecting the the complete dataset.
- In a column family there can be billion of columns.
Column Family - Super Column Family
- A super column family is a NoSQL object that contains multiple column families.
- It is a tuple (pair) that consists of a key-value pair, where the key is mapped to value which are column families.
- In traditional relational database systems analogy it can be seen as something like a "view" on more than one tables.
Super Column Family - Keyspace
- A keyspace contains the column families just like a database contains tables in relational world, they are used to group column families together.
- In traditional relational database analogy keyspaces can be seen as database schema.
- It is the outer most grouping of the data in the data store.
- Generally in a cluster there is one keyspace per application.
- The keyspace may include meta information such as replication factor and data center awareness.
- There is a default keyspace provided for Cassandra internals named system
Replication
The term replication means how many copies of each piece of data we need in our cluster. It is the process of storing copies of data on multiple nodes to ensure reliability and fault tolerance.Cassandra stores multiple copies of data known as replicas. We may set the number of replicas while creating a keyspace. Placement of these replicas in the cluster are determined by replica placement strategy.
- Replication Strategy
- It determines which nodes hold replicas of a row i.e. SimpleStrategy picks the next N-1 nodes clockwise around the ring from the node that stores the row. Cassandra also provides other strategies that take into account multiple data centers, both local and geographically dispersed.
- It sets the distribution of the replicas across the nodes in the cluster depending on the cluster's topology.
- Replication strategy is usually defined at the time of keyspace creation.
- Basically, there are two replication strategies are available:
- SimpleStrategy
- Simple strategy should be used for a single data center only. In case of more than one datacenter SimpleStrategy must not be used.
- It places the first replica on a node determined by the partitioner. Additional replicas are placed on the next nodes clockwise in the ring without considering topology i.e. rack or data center location.
- NetworkTopologyStrategy
- It is the recommended strategy for most deployments because it generalises the strategy from one datacenter to N, since it is much easier to expand to multiple data centers when required by future expansion.
- NetworkTopologyStrategy should be used when we have Cassandra cluster deployed across multiple data centers. This strategy specify how many replicas we want in each data center.
- It places replicas in the same data center by walking the ring clockwise until it reaches the first node in another rack.
- NetworkTopologyStrategy tries to place replicas on distinct racks because nodes in the same rack (or similar physical grouping) might fail at the same time due to power, network or other hardware issues.
- Replication Factor
- The total number of replicas across the cluster are referred as the replication factor. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node (node will be determined by replication strategy). Note that all replicas are equally important for failover; there is no primary or master replica (remember there is no master in Cassandra cluster).
- Logically, the replication factor should not exceed the number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of nodes afterwards. When replication factor exceeds the number of nodes, writes are rejected, but reads are served as long as the desired consistency level can be met.
- Partitioner
- The partitioner controls how the data is distributed over your nodes. In order to find a set of keys, Cassandra must know what nodes have the range of values client is looking for.
- A partitioner is a hash function for calculating the token or hash of a row key to replicate the data in cluster. Each row of data is uniquely identified by a row key and distributed across the cluster by the value of the token.
- The token generated by the partioner is further used by replication strategy to place the replica in the cluster.
- Cassandra offers following three partitioner out of the box:
- Murmur3Partitioner (default): uniformly distributes data across the cluster based on MurmurHash hash values.
- RandomPartitioner: uniformly distributes data across the cluster based on MD5 hash values.
- ByteOrderedPartitioner: keeps an ordered distribution of data lexically by key bytes.
- Partitioner are configured in the cassandra.yaml file as follows:
- Murmur3Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
- RandomPartitioner: org.apache.cassandra.dht.RandomPartitioner
- ByteOrderedPartitioner: org.apache.cassandra.dht.ByteOrderedPartitioner
- Consistency Level
- Consistency level in Cassandra can seen as how many replicas must response to declare a successful read or write operation.
- Consistency refers to how up-to-date and synchronized a row of Cassandra data is on all of its replicas.
- Since Cassandra extends the concept of eventual consistency by offering tunable consistency, hence for any given read or write operation, the client application decides how consistent the requested data must be.
We will have see Cassandra's architecture highlights and read/write internals in the next post.
Useful links:
http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/#.VRHe-s2sXrd
http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-and-consistency
I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Apache Kafka, kindly contact us http://www.maxmunus.com/contact
ReplyDeleteMaxMunus Offer World Class Virtual Instructor led training on in Apache Kafka. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Demo Contact us.
Nitesh Kumar
MaxMunus
E-mail: nitesh@maxmunus.com
Skype id: nitesh_maxmunus
Ph:(+91) 8553912023
http://www.maxmunus.com/
I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Apache Spark TECHNOLOGY , kindly contact us http://www.maxmunus.com/contact
ReplyDeleteMaxMunus Offer World Class Virtual Instructor-led training on TECHNOLOGY. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ pieces of training in India, USA, UK, Australia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Demo Contact us.
Pratik Shekhar
MaxMunus
E-mail: pratik@maxmunus.com
Ph:(0) +91 9066268701
http://www.maxmunus.com/
Great Article
DeleteIEEE Projects for CSE in Big Data
Java Training in Chennai
Final Year Project Centers in Chennai
Java Training in Chennai
I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in Apache Cassandra.kindly contact us http://www.maxmunus.com/contact
ReplyDeleteMaxMunus Offer World Class Virtual Instructor led training on Apache Cassandra. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Free Demo Contact us:
Name : Arunkumar U
Email : arun@maxmunus.com
Skype id: training_maxmunus
Contact No.-+91-9738507310
Company Website –http://www.maxmunus.com
Thanks a lot very much for the high quality and results-oriented help. I won’t think twice to endorse your blog post to anybody who wants and needs support about this area.
ReplyDeletedigital marketing training in tambaram
digital marketing training in annanagar
digital marketing training in marathahalli
digital marketing training in rajajinagar
Digital Marketing online training
full stack developer training in pune
Excellent blog, I wish to share your post with my folks circle. It’s really helped me a lot, so keep sharing post like this
ReplyDeleteJava training in Chennai
Java training in Bangalore
Online Casino Super Earnings top 10 online casinos here Win online casinos and live like a king in the world.
ReplyDelete
ReplyDeleteThis is quite educational arrange. It has famous breeding about what I rarity to vouch. Colossal proverb.
This trumpet is a famous tone to nab to troths. Congratulations on a career well achieved. This arrange is synchronous s informative impolites festivity to pity. I appreciated what you ok extremely here
Selenium training in bangalore
Selenium training in Chennai
Selenium training in Bangalore
Selenium training in Pune
Selenium Online training
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
ReplyDeleteSelenium Training in Electronic City
ReplyDeleteنجار ابواب بالرياض نجار بالرياض
تركيب ستائر بالرياض شركة تركيب ستائر بالرياض
تنظيف مكيفات بالرياض شركة تنظيف مكيفات بالرياض
شركة تنظيف افران الغاز بالرياض شركة تنظيف افران بالرياض
In any case, on the off chance that you are a set up Data Scientist or one who is appealing the measure to transform into one don't simply underestimate the data given previously. in its place, do some exploration by physically and check the figures.Data Analytics Course
ReplyDeletePost is very useful. Thank you, this useful information.
ReplyDeleteGet Best SAP HR HCM Training in Bangalore from Real Time Industry Experts with 100% Placement Assistance in MNC Companies. Book your Free Demo with Softgen Infotech.
Very Informative Article
ReplyDeleteData Science Interview Questions
Nice post. Thanks for sharing! I want people to know just how good this information is in your blog. It’s interesting content and Great work.
ReplyDelete360DigiTMG digital marketing courses in hyderabad
Nice post. Thanks for sharing! I want people to know just how good this information is in your blog. It’s interesting content and Great work.
ReplyDeletemachine learning courses
data science course in hyderabad
business analytics courses in hyderabad
I am impressed by the information that you have on this blog. It shows how well you understand this subject.
ReplyDeletedata analytics course
data science course
big data course
360DigiTMG
I appreciate that you produced this wonderful article to help us get more knowledge about this topic.Thanks for your info....
ReplyDeleteAndroid Training in Chennai | Certification | Mobile App Development Training Online | Android Training in Bangalore | Certification | Mobile App Development Training Online | Android Training in Hyderabad | Certification | Mobile App Development Training Online | Android Training in Coimbatore | Certification | Mobile App Development Training Online | Android Training in Online | Certification | Mobile App Development Training Online
Amazing Article ! I would like to thank you for the efforts you had made for writing this awesome article.
ReplyDeleteThanks for sharing such a nice info.I hope you will share more information like this. please keep on sharing!
internship in chennai
internship in chennai for cse
internship for mba in chennai
internship in chennai for hr
internship in chennai for mba
companies for internship in chennai
internship in chennai for ece
paid internship in chennai
internship in chennai for biotechnology
internship in chennai for b.com students
Amazing Article, Really useful information to all So, I hope you will share more information to be check and share here.
ReplyDeleteInplant Training for cse
Inplant Training for IT
Inplant Training for ECE Students
Inplant Training for EEE Students
Inplant Training for Mechanical Students
Inplant Training for CIVIL Students
Inplant Training for Aeronautical Engineering Students
Inplant Training for ICE Students
Inplant Training for BIOMEDICAL Engineering Students
Inplant Training for BBA Students
very well explained. I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
ReplyDeleteCorrelation vs Covariance
Simple Linear Regression
data science interview questions
KNN Algorithm
Logistic Regression explained
very well explained. I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
ReplyDeleteLogistic Regression explained
Correlation vs Covariance
Simple Linear Regression
data science interview questions
KNN Algorithm
Wow, What a Excellent post. I rceally found this to much informatics. It is what i was searching for.I would like to suggest you that please keep sharing such type of info.Thankdata science courses
ReplyDeleteAnd a bit of a curiosity like a child but not childish. This made me write some similar content but on a completely different subject which has some relationship to the subject of this article. Here is the link. If you like, go see it at data science course in india
ReplyDelete