Cassandra is a free open-sourced distributed database management system used to manage big data set across multiple nodes. In Cassandra, every node is interconnected with each other and accept both read and write operations. It has no single point of failure because if a node goes down another node will serve instead of that node.
In Cassandra, these nodes are used for replications for a given piece of data.
Components of Cassandra:
The main components of Cassandra are as follows:
- Node: A node is a place (storage) where data is stored. We can add as many nodes as needed.
- Data Center: Data Center is a collection of some related nodes. A Data-center can have one or multiple nodes.
- Cluster: Cluster contains data-centers. A cluster can have one or multiple data-center.
- Commit Log: Commit log is a crash-recovery mechanism. Every write goes to the commit log.
- Mem-Table: After writing data to the commit log this data is written to the mem-table.
- SSTable: It is a disk file that is used to flushed data form mem-table when mem-table's content reaches a threshold value.
- Bloom Filter: Bloom filters are algorithms used to test whether an element is a member of a set or not. It is a special kind of cache mechanism.
Cassandra Query Language (CQL):
CQL is SQL like query language is used to run queries on the Cassandra database. CQL is used to create, update, delete Kyspace, tables, rows.
When a write operations come to the Cassandra database the data goes to it's specified node according to the partition key by using the ring (Consistent hashing algorithms). In node, data is captured by commit log as well as this data is captured and stored in mem-table. Whenever the mem-table will be full then data will be flushed into SSTable.
For a read operation, Cassandra returns the value-form mem-table and use the bloom filter to find out which SSTable contains the required data.