Thursday, September 18, 2014

Flume | Setup

We learnt about Flume in previous post. We will setup and run Flume agent with Avro source and a Java based client here.

Installation

For agent installation we will use one the three nodes setup earlier for agent JVM and all three HDFS nodes for sink (as setup described in earlier post ).
We will setup single agent :

Note that here we are using following details for installation (for complete setup):

     - Installation base directory:  
      • /home/anishsneh/installs
     - Installation user name:
      • anishsneh
     - Hostnames: 
      • server01
Steps to install Flume NG agent:
  1. Install Flume - we will use Apache Flume 1.5.0.1 (with Hadoop2)
    • Download apache-flume-1.5.0.1-bin.tar.gz from Flume Website, note that we are using Hadoop 2 for sink
    • Extract downloaded package to anishsneh@server01:/home/anishsneh/installs, such that we have:
           [anishsneh@server01 installs]$ ls -ltr apache-flume-1.5.0.1-bin
           total 128
           -rw-r--r--.  1 anishsneh anishsneh  1779 Mar 28 15:15 README
           -rw-r--r--.  1 anishsneh anishsneh  6172 Mar 28 15:15 DEVNOTES
           -rw-r--r--.  1 anishsneh anishsneh 22517 May  6 16:29 LICENSE
           -rw-r--r--.  1 anishsneh anishsneh 61591 Jun 10 13:56 CHANGELOG
           -rw-r--r--.  1 anishsneh anishsneh   249 Jun 10 14:08 NOTICE
           -rw-r--r--.  1 anishsneh anishsneh  1591 Jun 10 14:08 RELEASE-NOTES
           drwxr-xr-x. 10 anishsneh anishsneh  4096 Jun 10 15:10 docs
           drwxrwxr-x.  2 anishsneh anishsneh  4096 Sep 17 14:59 lib
           drwxrwxr-x.  2 anishsneh anishsneh  4096 Sep 17 14:59 tools
           drwxr-xr-x.  2 anishsneh anishsneh  4096 Sep 17 14:59 bin
           drwxr-xr-x.  2 anishsneh anishsneh  4096 Sep 17 14:59 conf
          
    • Create hdfs://server01:9000/data/flume directory on HDFS and change its permissions to 777 on server01 (for this demo)
           [anishsneh@server01 installs]$ hadoop fs -mkdir /data/flume
          
           [anishsneh@server01 installs]$ hadoop fs -chmod 777 /data/flume
          

Saturday, September 13, 2014

Kafka | Setup

We learnt about Kafka in previous post. We will setup and run three node Kafka cluster (fully distributed) here.

Installation

For installation we will use three CentOS VMs which we configured in earlier post. We will setup three node Kafka cluster.

Note that here we are using following details for installation (for complete Kafka setup):

     - Installation base directory:  
      • /home/anishsneh/installs
     - Installation user name:
      • anishsneh
     - Hostnames: 
      • server01 (broker 1)
      • server02 (broker 2)
      • server03 (broker 3)
Steps to install Kafka:
  1. Install Kafka We will use Kafka 0.8.1.1 (kafka_2.10-0.8.1.1.tgz) built using Scala 2.10
    • Download kafka_2.10-0.8.1.1.tgz from Apache Kafka webpage, note that we are using Kafka 0.8.1.1 which is compiled using Scala 2.10
    • Extract downloaded package to /home/anishsneh/installs, such that we have:
      [anishsneh@server01 installs]$ ls -ltr kafka_2.10-0.8.1.1
      total 28
      -rw-rw-r--. 1 anishsneh anishsneh   162 Apr 22 11:37 NOTICE
      -rw-rw-r--. 1 anishsneh anishsneh 11358 Apr 22 11:37 LICENSE
      drwxr-xr-x. 2 anishsneh anishsneh  4096 Apr 22 12:26 libs
      drwxr-xr-x. 2 anishsneh anishsneh  4096 Apr 22 12:26 config
      drwxr-xr-x. 3 anishsneh anishsneh  4096 Apr 22 12:26 bin
      
    • Repeat above steps for all the three hosts.

Tuesday, September 9, 2014

HBase | Phoenix

We setup and started HBase cluster previous post. We will write a JDBC based client using Apache Phoenix here.

Apache Phoenix

Apache Phoenix is a JDBC skin on HBase client which turns HBase into a SQL supported database. The driving force behind Phoenix development was to use a well-under stood language like SQL to make it easier for people to use HBase instead of learning another proprietary API. It was originally it was developed by salesforce.com as as a Java/JDBC layer enabling developers to run SQL queries on Apache HBase and later it was open sourced and moved under Apache umbrella.
As per Apache documentation "Apache Phoenix is a SQL skin over HBase delivered as a client-embedded JDBC driver targeting low latency queries over HBase data. Apache Phoenix takes your SQL query, compiles it into a series of HBase scans and orchestrates the running of those scans to produce regular JDBC result sets."
It is entirely written in Java and provides a client-embeddable JDBC driver; It has its own query engine, co—processors and meta-data. Phoenix is used internally by salesforce.com for low latency queries in the order of milliseconds for simple queries or seconds when tens of millions of rows are processed, according to the project's description.
The Phoenix query engine transforms SQL query to HBase scans, executes using co-processors and produces JDBC result sets. Under the hood it compiles queries into native HBase calls (there is NO map-reduce involved)
Note that Phoenix JDBC is developed only for HBase and restricted to HBase ONLY

Sunday, September 7, 2014

HBase | Setup

We learnt about HBase in previous post. We will setup and run client on an HBase cluster(fully distributed) here.

Installation

For installation we will use three node Hadoop - YARN cluster (as setup/described in earlier post ).
We will setup three node HBase cluster:

Note that here we are using following details for installation (for complete setup):

     - Installation base directory:  
      • /home/anishsneh/installs
     - Installation user name:
      • anishsneh
     - Hostnames: 
      • server01 (master+slave)
      • server02 (only slave)
      • server03 (only slave)
Steps to install HBase (on the top of Hadoop 2 cluster):
  1. Install HBase - we will use HBase 0.98.4 (Hadoop2)
    • Download hbase-0.98.4-hadoop2-bin.tar.gz from HBase Website, note that we are using Hadoop 2 version of HBase binary
    • Extract downloaded package to /home/anishsneh/installs, such that we have:
      [anishsneh@server01 installs]$ ls -ltr hbase-0.98.4-hadoop2
      total 172
      -rw-r--r--.  1 anishsneh anishsneh    897 Jun  6 10:33 NOTICE.txt
      -rw-r--r--.  1 anishsneh anishsneh  11358 Jun  6 10:33 LICENSE.txt
      -rw-r--r--.  1 anishsneh anishsneh   1377 Jul 14 18:23 README.txt
      drwxr-xr-x.  2 anishsneh anishsneh   4096 Jul 14 18:23 conf
      drwxr-xr-x.  4 anishsneh anishsneh   4096 Jul 14 18:23 bin
      -rw-r--r--.  1 anishsneh anishsneh 134544 Jul 14 18:27 CHANGES.txt
      drwxr-xr-x.  7 anishsneh anishsneh   4096 Jul 14 19:37 hbase-webapps
      drwxr-xr-x. 29 anishsneh anishsneh   4096 Jul 14 19:45 docs
      drwxrwxr-x.  3 anishsneh anishsneh   4096 Sep  7 14:47 lib
      
    • Repeat above steps for all the three hosts.
    • Create hdfs://server01:9000/data/hbase directory on HDFS and change its permissions to 777 (for this demo)
    • Create /home/anishsneh/installs/tmp/hbase directory on LFS in all of the three servers (i.e. server01, server02, server03)