Cassandra NoSQL Database

https://en.wikipedia.org/wiki/Apache_Cassandra

  • every node in the cluster has the same role / no master
  • no single point of failure.
  • supports multi data center replication
  • read and write scale linearly with number of nodes
  • consistency is configurable

Getting started

Run a single node DB via docker (can be accessed via localhost 9042). Add -d to run in the background

docker run -p 9042:9042 --name cas0 cassandra

Or start your own cluster (found on https://gokhanatil.com/2018/02/build-a-cassandra-cluster-on-docker.html)

docker run -p 9042:9042 --name cas1 -e CASSANDRA_CLUSTER_NAME=MyCluster -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_DC=datacenter1 cassandra

Once it runs, add more nodes with the IP of the the first one. One of the nodes is even in a different data center

FIRST_IP="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' cas1)"
docker run --name cas2 -e CASSANDRA_SEEDS="$FIRST_IP" -e CASSANDRA_CLUSTER_NAME=MyCluster -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_DC=datacenter1 cassandra
FIRST_IP="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' cas1)"
docker run --name cas3 -e CASSANDRA_SEEDS="$FIRST_IP" -e CASSANDRA_CLUSTER_NAME=MyCluster -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_DC=datacenter1 cassandra
FIRST_IP="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' cas1)"
docker run --name cas4 -e CASSANDRA_SEEDS="$FIRST_IP" -e CASSANDRA_CLUSTER_NAME=MyCluster -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_DC=datacenter2 cassandra

Check the status of your cluster

docker exec -ti cas1 nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 10.101.0.1 108.61 KiB 256 68.2% 68f10b1a-0313-4fb7-8640-6b1afdab1a5f rack1
UN 10.101.0.3 69.91 KiB 256 65.1% eb7d4399-ea6e-4a67-8dca-9a64f318ea8f rack1
UN 10.101.0.2 93.98 KiB 256 66.8% c8b4140f-faeb-4f0c-b802-7494364777db rack1
Datacenter: datacenter2
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UJ 10.101.0.4 15.47 KiB 256 ? bd40da84-a7ad-4eee-b481-ec8ca2b263c1 rack1

There is also a web UI frontend for Cassandra: https://hub.docker.com/r/delermando/docker-cassandra-web

FIRST_IP="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' cas1)"

docker run --name cassandra-web -e CASSANDRA_HOST_IP="$FIRST_IP" -e CASSANDRA_PORT=9042 -e CASSANDRA_USERNAME=cassandra -e CASSANDRA_PASSOWRD=cassandra -p 3000:3000 delermando/docker-cassandra-web:v0.4.0

We will be accessible at http://localhost:300 and should look like this

Cassandra Terminology

  • node: One running instance of Cassandra
  • cluster: Several nodes
  • datacenter: Several nodes that can exchange data fast and cheap
  • Column: The basic data structure of Cassandra with column name, column value, and a time stamp
  • SuperColumn: Like a column but its values are other columns. The can improve performance if you group columns in a SuperColumn that you often read together
  • Column Family / table: A table with columns and rows. The rows are free to not have all the columns
  • keyspace: Like a database, groups several column family

Creating data structures

You can enter the statements via the web ui, via the cqlsh command of any of the cluster node or via docker

docker exec -ti cas1 cqlsh

Create a keypsace where all the data is stored at least twice in datacenter1 and at least once in datacenter2

CREATE KEYSPACE keyspacetest1
WITH replication = {
        'class' : 'NetworkTopologyStrategy',
        'datacenter1' : 2,
        'datacenter2' : 1
};

Create a keyspace where all the data is stored at least on 2 nodes

CREATE KEYSPACE keyspacetest2
WITH replication = {
        'class': 'SimpleStrategy',
        'replication_factor' : 2
};

Create a simple table

CREATE TABLE keyspacetest2.people (
        id INT PRIMARY KEY,
        name text
);

Data

INSERT INTO keyspacetest2.people (id, name) VALUES(1, 'John');
INSERT INTO keyspacetest2.people (id, name) VALUES(2, 'Doe');
INSERT INTO keyspacetest2.people (id, name) VALUES(3, 'Jane');
INSERT INTO keyspacetest2.people (id, name) VALUES(4, 'Frank');
SELECT * FROM keyspacetest2.people;

Have complex types in a column

CREATE TABLE keyspacetest1.people2  (id INT, NAME text, EMAIL LIST<text>, PRIMARY KEY(id) );
INSERT INTO keyspacetest1.people2 (id,name,email) VALUES(1, 'John',['test@example.com', 'test2@example.com']);
UPDATE keyspacetest1.people2 SET email=email+['foo@example.com'] WHERE id=1; // ["test@example.com","test2@example.com","foo@example.com"] John

Java Integration

There are several java clients http://cassandra.apache.org/doc/latest/getting_started/drivers.html#java

Cassandra datastax java

https://github.com/datastax/java-driver

final List<InetSocketAddress> nodes = ...;

final CqlSessionBuilder builder = CqlSession.builder();
builder.addContactPoints(nodes);
builder.withLocalDatacenter("NameOfYourDataCenter");

session = builder.build();

final Relation relationA = Relation.column(partitionColumn).isEqualTo(bindMarker());
final Relation minDate = Relation.column(clusterColumn).isGreaterThanOrEqualTo(bindMarker());
final Relation maxDate = Relation.column(clusterColumn).isLessThanOrEqualTo(bindMarker());

Select query = QueryBuilder
.selectFrom(myPartitionKey, myClusterKey)
.column(myDataColumn)
.where(relationA)
.where(minDate)
.where(maxDate);

PreparedStatement statement=session.prepare(query.build());
BoundStatement bound=statement.bind("A", now, later);



final RegularInsert insert = insertInto(myPartitionKey, myClusterKey)
                               .value(myDataColumn1, bindMarker())
                               .value(myDataColumn2, bindMarker())
                               );
PreparedStatement statement2=session.prepare(insert.build());
session.execute(statement2.bind("A", "B"));