Apache Kafka

Apache Kafka was originated at LinkedIn and later became an open sourced Apache project in 2011 

Then First-class Apache project in 2012 

Kafka is written in Scala and Java

Apache Kafka is publish-subscribe based fault tolerant messaging system. 

Apache Kafka is fast, scalable and distributed by design.


What is a Messaging System?

A Messaging System is responsible for transferring data from one application to another

Distributed messaging is based on the concept of reliable message queuing. 

Messages are queued asynchronously between client applications and messaging system. 

Two types of messaging patterns are available − 

1) point to point 

Messages are persisted in a queue. 

One or more consumers can consume the messages in the queue, but a particular message can be consumed by a maximum of one consumer only. 

Once a consumer reads a message in the queue, it disappears from that queue.

2) publish-subscribe (pub-sub) 

message producers are called publishers and message consumers are called subscribers.

messages are persisted in a topic.

consumers can subscribe to one or more topic and consume all the messages in that topic.


Apache Kafka is a distributed publish-subscribe messaging system and a robust queue that can handle a high volume of data and enables you to pass messages from one end-point to another.

Kafka is suitable for both offline and online message consumption. 

Kafka messages are persisted on the disk and replicated within the cluster to prevent data loss. 

Kafka is built on top of the ZooKeeper synchronization service. 

Integrates very well with Apache Storm and Spark for real-time streaming data analysis.


Benefits:

    Reliability − Kafka is distributed, partitioned, replicated and fault tolerance.

    Scalability − Kafka messaging system scales easily without down time..

    Durability − Kafka uses Distributed commit log which means messages persists on disk as fast as possible, hence it is durable..

    Performance − Kafka has high throughput for both publishing and subscribing messages. It maintains stable performance even many TB of messages are stored.

Kafka is very fast and guarantees zero downtime and zero data loss.


Use Cases

    1. Metrics
    2. Log Aggregation Solution
    3. Stream Processing

Kafka is very fast, performs 2 million writes/sec. 

Kafka persists all data to the disk, which essentially means that all the writes go to the page cache of the OS (RAM). This makes it very efficient to transfer data from page cache to a network socket.

Kafka is a distributed streaming platform for real-time data with low latency and high throughput.

Install in windows:
https://www.geeksforgeeks.org/installation-guide/how-to-install-and-run-apache-kafka-on-windows/



Comments

Popular posts from this blog

PL/SQL

JAVA8 Features

Build Automation