Friday 14 October 2016

Cassandra - Foundation - I

nodetool status

NoSQL [Not Only SQL]
http://www.planetcassandra.org/what-is-apache-cassandra/

Apache Cassandra is a free and open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

Cassandra’s architecture is responsible for its ability to scale, perform, and offer continuous uptime. Rather than using a legacy master-slave or a manual and difficult-to-maintain sharded architecture, Cassandra has a masterless “ring” design that is elegant, easy to setup, and easy to maintain.

Dynamo-Amazon's Highly available key-value store
replicates data across nodes and clusters to tolerate node failure,
and advocates eventual consistency

Bigtable - a distributed storage system for structured data - from Google
- address data persistence and management on web scale
- provides super read/write performance linear scalability, and continuous availability

Cassandra Architecture


  • Distribution Model - Dynamo
  • Storage Model - BigTable
  • Peer-to-peer [runs on p-p architecture]
  • Column-oriented  - store, provides great schema-less flexibility for application developers
  • Commitlog - from BigTable, which ensures data durability
  • Partition
    •  - Inherited from Dynamo, ability to scale horizontally and incrementally is a key design - dynamically partitions the data over the nodes and clusters to achieve it
  • Replication Strategy - it retains high availability by replicating data at number of nodes by configuring this.
Write Path


When a write occurs, the data will be immediately appended to the commitlog on Disk to ensure durability.
Then cassandra store the data to InMemory store called memtable
When memtable is full it contents will be flushed to SSTable using sequential i/o and commitlog is purged after the flush.
As Random i/o is avoided write performance is very high
commitlog - Writes go here first. Failed nodes can recover from the data in the commit log. A kind of journal
SSTable - Sorted String Table [Disk based immutable data file (table data)]
memtable - A write-back cache of data rows

Read Path

When a read occurs, the data to be returned will be merged from all the related SSTables and memtable. Timestamps will used to determine the up-to-date record
The merge value is also stored in write-through row-cache to improve future read performance

Write/Read tuneable Consistency Levels


Number of replicas that acknowledges a successful write/read

Repair mechanisms
3 repair mechanism to tolerate failure
  • Read repair - When a node is found to have old data during a read it can be brought up to date with the other replicas
  • Hinted handoff - If a write is made and a replica node for the key is down (and hinted_handoff_enabled == true), Cassandra will write a hint to the coordinator node indicating that the write needs to be replayed to the unavailable node
  • Anti-entropy - run by administrators manually                                                                              -Anti-entropy is a process of comparing the data of all replicas and updating each replica to the newest version. nodetool repair

Cassandra Data Model

Storage Model
- column-oriented store
- Map<RowKey, SortedMap<ColumnKey, ColumnValue>>
-Schemaless

Logical Data Structure


Column
- is smallest data model element and storage unit
- is name-value pair plus a Timestamp and an optional TTL [Time-To-Live]
- name, and value are byte arrays corresponds to ColumnKey and ColumnValue of SortedMap
- Timestamp supplied by client which can ignored while data modelling. can we used to resolve transaction conflicts
- TTL - to mark the column as deleted after expiration
Data is marked with a Tombstone after the TTL expires
Tombstones in Cassandra is a Marker indication that the data can be deleted during compaction

CAP TheoremConsistent, Available, Partition tolerant - you may only choose 2 only in a distributed system

No comments:

Post a Comment