Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Image Added

Table of Contents

Overview

Apache Kafka is an open source project for a distributed publish-subscribe messaging system rethought as a distributed commit log.

...

Created by LinkedIn and is now an Open Source project maintained by Confluent.

Image Removed


Kafka Use Cases

Some use cases for using Kafka:

  • Messaging System
  • Activity Tracking
  • Gathering metrics from many different sources
  • Application Logs gathering
  • Stream processing (with the Kafka Streams API or Spark for example)
  • De-coupling of system dependencies
  • Integration with Spark, Flink, Storm, Hadoop and many other Big Data technologies


Architecture


Image Added


  1. Source Connectors pull data from sources
  2. Data is sent to Kafka cluster
  3. Transformation of topic data into another topic can be done with Streams
  4. Sink Connectors in Connect cluster pull data from Kafka
  5. Sink Connectors push data to sinks


Kafka 

Topics and Partitions

Topics: a particular stream of data

  • similar to a table in a database(without constraints)
  • you can have as many topics as you want
  • a topic is identified by it's name


Image Added

Topics are split into partitions

  • each partition is ordered
  • each message with a partition gets an incremental id, called offset.
  • offsets are only relevant for a particular partition
  • order is guaranteed only in a partition (not across partitions)
  • data is assigned to a random partition unless a key is provided
  • you can have as many partitions per topic as you want
  • specifying a key, ensures that your message is written to the same partition (which ensures order).


Installation on Kubernetes

Installing Kafka Cluster

We are using the bitnami helm chart:

...

Code Block
$ kubectl get pods

NAME                     READY   STATUS    RESTARTS   AGE
kafka-0                  1/1     Running   3          3h9m
kafka-zookeeper-0        1/1     Running   0          3h9m


Installing Kafka Connect Cluster

...



References

...