In Apache Kafka, several key components interact to
facilitate high throughput, fault tolerance, and scalability in distributed
messaging systems. The main components include topics, partitions,
consumer groups, brokers, and Kafka clusters (referred to
as Kafka group in some contexts). Here’s how these components relate to
each other:
1. Kafka Cluster (Kafka Group)
A Kafka cluster is a collection of Kafka brokers
that manage the data streams. It acts as the central point where data is
produced and consumed. Each Kafka cluster can have multiple brokers to handle
various topics and partitions, ensuring that the system scales horizontally.
Ø Cluster:
A Kafka cluster consists of multiple Kafka brokers that work together.
Ø High
Availability: The cluster provides high availability and fault tolerance by
replicating data across multiple brokers.
2. Broker
A broker serves as a Kafka server that manages data storage
and fulfills requests from client applications, which include producers and
consumers. Kafka brokers are responsible for:
Ø Storing
messages in topics and partitions.
Ø Managing
incoming requests from producers and outgoing requests to consumers.
Ø Distributing
partitions across brokers to balance load and ensure fault tolerance.
Ø Replication:
Each partition of a topic can be replicated across multiple brokers for fault
tolerance. This means if a broker fails, another broker will have the replica
of the partition.
3. Topic
A topic is a logical channel for categorizing
records. Topics within Kafka are segmented into partitions, which
function as the primary storage components. Kafka topics are created for
various types of data or events, and records produced by producers are
published to a specific topic.
Ø Topic
→ Partitions: A topic can have multiple partitions, enabling Kafka to scale
horizontally across brokers.
Ø Message
Routing: Producers send messages to topics, and consumers subscribe to
topics to read those messages.
4. Partition
A partition is a unit of parallelism and distribution
in Kafka. Each topic is divided into one or more partitions, and each partition
is stored on a different broker. This helps Kafka achieve high throughput and
fault tolerance.
Ø Partition
→ Broker: A partition is stored on a specific Kafka broker, but it can be
replicated on other brokers for fault tolerance.
Ø Partition
→ Consumer Group: Partitions are consumed by consumers in a consumer
group.
5. Consumer Group
A consumer group is a group of consumers that work
together to consume messages from Kafka topics. Each consumer in the group is
responsible for reading messages from one or more partitions.
Ø Group
→ Partition: Each partition can only be read by one consumer in the group
at a time. Multiple consumers can be part of a single consumer group, which
allows Kafka to distribute partitions for parallel processing among these
consumers.
Ø Fault
Tolerance: If a consumer in a group fails, another consumer can take over
reading from the partitions that were assigned to the failed consumer.
Ø Multiple
Consumer Groups: You can have multiple consumer groups reading from the
same topic. Each group will get a full copy of the topic’s data, so they do not
interfere with each other.
6. Producer
Publishing records (messages) to Kafka topics is a primary
duty of a producer. Producers can send messages to specific partitions
of a topic, or Kafka can automatically determine the partition to send the
message to, based on the key or other factors.
Ø Producer
→ Topic: A producer publishes records to a specific topic.
Ø Producer
→ Partition: The producer can either explicitly specify the partition or
let Kafka choose the partition using a partitioner algorithm.
Interaction Between Components
Here’s how the components relate to each other:
- Producer
→ Topic: A producer sends records to a topic.
- Topic
→ Partition: A topic is split into partitions for
scalability. Each partition is housed on one or multiple brokers.
- Broker
→ Partition: A broker stores one or more partitions.
Partitions are distributed across brokers in the cluster for load
balancing.
- Offsets
in Partitions
- Every
record in a partition is assigned an offset.
- Consumers
use offsets to track which messages they have already read.
- Consumer
Group → Partition: A consumer group consumes data from partitions.
Within the group, only one consumer is responsible for each partition.
- Broker
→ Consumer Group: Kafka brokers serve data to consumer groups.
If there are multiple consumer groups, Kafka will provide separate copies
of data to each group.
- Replication:
Partitions can be replicated across brokers in the cluster for
fault tolerance. Each partition has a leader (the primary replica) and
multiple followers (secondary replicas).
Example Scenario
We can gain a clearer understanding of these relationships
by considering a tangible example:
Ø Kafka
Cluster: A Kafka cluster with 3 brokers (Broker A, Broker B, Broker C).
Ø Topic:
There is a topic called user-activity.
Ø Partitions:
The user-activity topic has 6 partitions (P0, P1, P2, P3, P4, P5).
Ø Consumer
Group: A consumer group called activity-consumers has 3 consumers (Consumer
1, Consumer 2, Consumer 3).
Data Flow:
- Producer
→ Topic: A producer sends records (e.g., user activity logs) to the
user-activity topic.
- Topic
→ Partitions → Broker: The user-activity topic is divided into 6
partitions. Each partition is stored on different brokers for load
balancing:
ü
Partition 0 (P0) is stored on Broker A.
ü
Partition 1 (P1) is stored on Broker B.
ü
Partition 2 (P2) is stored on Broker C.
ü
And so on.
- Consumer
Group → Partitions: The activity-consumers consumer group has 3
consumers:
ü
Consumer 1 is assigned partitions P0 and P1.
ü
Consumer 2 is assigned partitions P2 and P3.
ü
Consumer 3 is assigned partitions P4 and P5.
- Replication:
Kafka replicates each partition across multiple brokers for fault
tolerance. For example, partition P0 might have a replica on Broker B and
Broker C to ensure high availability if a broker fails.
Fault Tolerance:
Ø If
Broker A goes down, Consumer 1 can continue consuming partitions P0 and P1 from
replicas on Broker B or Broker C.
Ø If
Consumer 1 crashes, another consumer in the group can take over the consumption
of P0 and P1.
Summary of Relationships:
Ø Topic:
A logical feed/category of records.
Ø Partition:
A division of a topic for parallelism and scalability, stored on brokers.
Ø Broker:
A server in the Kafka cluster that stores partitions and serves client
requests.
Ø Consumer
Group: A set of consumers that collaboratively consume records from
partitions, ensuring each partition is consumed by only one consumer in the
group.
Ø Kafka
Cluster: A group of brokers working together to manage topics, partitions,
and replication, providing high availability, fault tolerance, and scalability.
This architecture allows Kafka to efficiently distribute and
process large amounts of data across multiple machines while providing fault
tolerance, scalability, and parallelism.
For More Related information, visit
Ø
Build
a Video Stream Microservice with Kafka & REST API in Java
Ø
Kafka
general questions and answers
For Other information, visit
Ø
How
to get the neighbor of binary tree
Ø
To
securely obtain employee information utilizing TLS 1.3 or TLS 1.2
Ø
Git Command
Ø How
to draw sequence diagram and other diagrams using plantuml
0 Comments