Monday, May 18, 2015

Cassandra | Setup

We learnt about Cassandra in previous post. We will setup and run client on an Cassandra cluster(fully distributed) here.

Installation

For installation we will use three nodes. We will install fully distributed Cassandra cluster. Here we are using following details for installation (for complete setup):

  • Installation base directory:
      • /home/anishsneh/installs
  • Installation user name:
      • anishsneh
  • Hostnames: 
      • server01 (first node, say with ip address 172.16.70.131)
      • server02 (second node, say with ip address 172.16.70.132)
      • server03 (third node, say with ip address 172.16.70.133)
Note that in Cassandra there is NO SINGLE POINT OF FAILURE, hence all the nodes are equal and there is no MASTER or SLAVE.

  • Install Cassandra
    • Download Apache Cassandra binary from Apache Website.
    • Extract downloaded package to /home/anishsneh/installs, such that we have:
      [anishsneh@server01 installs]$ ls -ltr apache-cassandra-2.1.5/
      total 360
      -rw-r--r--. 1 anishsneh anishsneh   2117 Apr 27 07:33 NOTICE.txt
      -rw-r--r--. 1 anishsneh anishsneh  64431 Apr 27 07:33 NEWS.txt
      -rw-r--r--. 1 anishsneh anishsneh  11609 Apr 27 07:33 LICENSE.txt
      -rw-r--r--. 1 anishsneh anishsneh 245971 Apr 27 07:33 CHANGES.txt
      drwxr-xr-x. 2 anishsneh anishsneh   4096 May 17 15:37 interface
      drwxr-xr-x. 4 anishsneh anishsneh   4096 May 17 15:37 javadoc
      drwxr-xr-x. 3 anishsneh anishsneh   4096 May 17 15:37 lib
      drwxr-xr-x. 3 anishsneh anishsneh   4096 May 17 15:37 pylib
      drwxr-xr-x. 4 anishsneh anishsneh   4096 May 17 15:37 tools
      drwxr-xr-x. 2 anishsneh anishsneh   4096 May 17 15:37 bin
      drwxrwxr-x. 2 anishsneh anishsneh   4096 May 17 15:51 logs
      drwxrwxr-x. 5 anishsneh anishsneh   4096 May 17 15:51 data
      drwxr-xr-x. 3 anishsneh anishsneh   4096 May 17 16:46 conf
      
    • Repeat above steps for all the three nodes.
  • Configure Cluster 
    • Set CASSANDRA_HOME="/home/anishsneh/installs/apache-cassandra-2.1.5" in ~/.bashrc (or wherever maintaining environment variables), reload profile/bash.
    • On first node edit $CASSANDRA_HOME/conf/cassandra.yaml with following:
      cluster_name: 'HELLO_CLUSTER'
      
      listen_address: 172.16.70.131
      
      rpc_address: 172.16.70.131
      
      seeds: "172.16.70.131,172.16.70.132,172.16.70.133"
      
      Here we are assuming first node has ip address 172.16.70.131. Note that other properties like data_file_directories, commitlog_directory can be changed if needed.
    • On first node make changes to the following properties in the script $CASSANDRA_HOME/conf/cassandra-env.sh: Uncomment/Update
      JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=172.16.70.131"
      
      LOCAL_JMX=no
      
      JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=false"
      
      Here we are assuming first node has the ip address 172.16.70.131
    • Repeat above steps for all the three nodes (with their respective ip addresses)
  • Start/Run Cluster
    • Execute $CASSANDRA_HOME/bin/cassandra on all the three nodes, it will start Cassandra server on all the three nodes and all the three server will join a cluster (as per the information provided in cassandra.yaml)
  • Verify Cluster
    • On one of the nodes go to $CASSANDRA_HOME/bin and execute following command:
      [anishsneh@server01 bin]$ ./nodetool -h server01 status
      Datacenter: datacenter1
      =======================
      Status=Up/Down
      |/ State=Normal/Leaving/Joining/Moving
      --  Address        Load       Tokens  Owns (effective)  Host ID                               Rack
      UN  172.16.70.131  188.19 KB  256     64.9%             0fc990e2-c257-4dfc-aec0-b151efd634d7  rack1
      UN  172.16.70.132  187.5 KB   256     67.8%             ba280c97-295c-4056-85f0-3c11594a3676  rack1
      UN  172.16.70.133  153.47 KB  256     67.3%             3a670717-401c-419a-8b89-73c1426df67b  rack1
      
      We may execute few more commands like:
      [anishsneh@server01 bin]$ ./nodetool -h server01 version
      ReleaseVersion: 2.1.5
      
      
      [anishsneh@server01 bin]$ ./nodetool -h server01 info
      ID                     : 0fc990e2-c257-4dfc-aec0-b151efd634d7
      Gossip active          : true
      Thrift active          : true
      Native Transport active: true
      Load                   : 188.19 KB
      Generation No          : 1431991363
      Uptime (seconds)       : 537
      Heap Memory (MB)       : 84.14 / 484.00
      Off Heap Memory (MB)   : 0.00
      Data Center            : datacenter1
      Rack                   : rack1
      Exceptions             : 0
      Key Cache              : entries 11, size 824 bytes, capacity 24 MB, 21 hits, 38 requests, 0.553 recent hit rate, 14400 save period in seconds
      Row Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
      Counter Cache          : entries 0, size 0 bytes, capacity 12 MB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
      Token                  : (invoke with -T/--tokens to see all 256 tokens)
      
      

CQLSH Client

Cassandra is shipped with a very useful command line client CQLSH which is a shell for CQL (Cassandra Query Language). It is an interactive command line interface for Cassandra. We will connect to Cassandra cluster using CQLSH here and execute various CRUD operations. CQLSH can be launched using command $CASSANDRA_HOME/bin/cqlsh script on any of the nodes (or where Cassandra is installed):
[anishsneh@server01 bin]$ ./cqlsh server01
Connected to HELLO_CLUSTER at server01:9042.
[cqlsh 5.0.1 | Cassandra 2.1.5 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
Create KEYSPACE:
cqlsh> CREATE KEYSPACE IF NOT EXISTS demo_keyspace WITH replication={'class' : 'SimpleStrategy', 'replication_factor':1};
Use the created KEYSPACE:
cqlsh> USE demo_keyspace;
Create COLUMN FAMILY:
cqlsh:demo_keyspace> CREATE TABLE IF NOT EXISTS demo_table(id varchar, login varchar, full_name varchar, country_code varchar, PRIMARY KEY(id));
Describe KEYSPACE:
cqlsh:demo_keyspace> DESCRIBE KEYSPACE demo_keyspace;

CREATE KEYSPACE demo_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'}  AND durable_writes = true;

CREATE TABLE demo_keyspace.demo_table (
    id text PRIMARY KEY,
    country_code text,
    full_name text,
    login text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

Insert records to COLUMN FAMILY
cqlsh:demo_keyspace> INSERT INTO demo_table(id, login, full_name, country_code) values('USR0000001', 'anishsneh', 'Anish Sneh', 'IN');
cqlsh:demo_keyspace> INSERT INTO demo_table(id, login, full_name, country_code) values('USR0000002', 'rakeshk', 'Rakesh K', 'UK');
cqlsh:demo_keyspace> INSERT INTO demo_table(id, login, full_name, country_code) values('USR0000003', 'ballys', 'Bally S', 'US');
cqlsh:demo_keyspace> INSERT INTO demo_table(id, login, full_name, country_code) values('USR0000004', 'yogeshd', 'Yogesh D', 'US');
Select records from COLUMN FAMILY
cqlsh:demo_keyspace> SELECT * FROM demo_table;

 id         | country_code | full_name  | login
------------+--------------+------------+-----------
 USR0000001 |           IN | Anish Sneh | anishsneh
 USR0000004 |           US |   Yogesh D |   yogeshd
 USR0000003 |           US |    Bally S |    ballys
 USR0000002 |           UK |   Rakesh K |   rakeshk

(4 rows)
Delete record from COLUMN FAMILY:
cqlsh:demo_keyspace> DELETE FROM demo_table WHERE id = 'USR0000002';
cqlsh:demo_keyspace> SELECT * FROM demo_table;

 id         | country_code | full_name  | login
------------+--------------+------------+-----------
 USR0000001 |           IN | Anish Sneh | anishsneh
 USR0000004 |           US |   Yogesh D |   yogeshd
 USR0000003 |           US |    Bally S |    ballys

(3 rows)
Update record in COLUMN FAMILY:
cqlsh:demo_keyspace> UPDATE demo_table SET country_code = 'CA' WHERE id = 'USR0000001';
cqlsh:demo_keyspace> SELECT * FROM demo_table;

 id         | country_code | full_name  | login
------------+--------------+------------+-----------
 USR0000001 |           CA | Anish Sneh | anishsneh
 USR0000004 |           US |   Yogesh D |   yogeshd
 USR0000003 |           US |    Bally S |    ballys

(3 rows)

Cassandra CQL queries can be used with Datastax JDBC driver (Java based high level client), demo programs can be found at anishsneh@git.