Understanding Consumer Groups and Partitions in Kafka

Let's Connect

In a typical modern enterprise, there would be multiple systems producing data in different formats and multiple systems which has to consume these data. Integrating these systems using a full mesh topology will be complicated owing to the myraid of protocols (HTTP,gRPC, Rsocket, JDBC, REST, TCP etc) and data formats (JSON, AVRO, CSV, Binary). Apache Kafka is designed to solve the data integration problem which arises when the number of systems producing and their corresponding consumers need to be integrated.

It can be used as a data integration layer such that the system producing the data can publish their data to an Apache Kafka Topic while consumer systems can subscribe to this topic to receive data.

Within Kafka, a topic is used to organize related events. They can be equated to SQL Tables in a relational database. However, as compared to a table in a relational database, we cannot query a Kafka topic. All applications that send data to a topic are called producers. Whereas applications receiving data by subscribing to a topic are called consumers. A single topic can have multiple consumers as shown below.

A Topic within Kafka is divided into partitions. The number of partitions within a topic is specified during the creation of a topic. It is important to note that Kafka guarantees the ordering of messages only within a partition, but not across partitions.

Further, the topics can be replicated across different Kafka Brokers for high availability. A high-level view of a producer producing to a set of topic partitions replicated across three brokers is shown in the figure below


Producers, Topics and Partitions

A producer sends a message to a topic and the message is distributed to a particular topic depending on the value of the messageKey. Each message sent to Kafka has an optional key and a value. If the is not specified by the producer, then messages are distributed across the topic in a round-robin fashion. However, if a key is specified by the producer, then all messages having the same value of the key ends up in the same partition.

Consumer, Consumer Groups and Partitions

Consumers reads data from Kafka Topics. In effect, consumers read from one or more partitions and data is read in order within each partition from lower offset to higher offset. Although the message read order is still guaranteed within each individual partition, the message order across many partitions is not guaranteed if the consumer consumes data from more than one at once.

For the purpose of horizontal scalability, consumers can be grouped together in a consumer group. A Kafka consumer group is made up of consumers who are a part of the same application. For example, they can be different instances of the same (micro) service which is consumed from a topic. As discussed above, a topic typically has numerous partitions. For Kafka consumers, these partitions serve as a parallelism unit. The advantage of using a Kafka consumer group is that the group’s members will work together to divide the effort of reading from various partitions.

For grouping consumers with the same consumer group, they must have the same group.id property. A GroupCoordinator and a ConsumerCoordinator are automatically used by Kafka Consumers to allocate consumers to a partition and make sure load balancing is accomplished across all consumers in the same group. A consumer from a consumer group can be assigned many topic partitions, but it’s vital to note that each topic partition is only assigned to one consumer inside a consumer group.

Consider four partitions of a topic and three consumers in a consumer group as shown in the diagram below:

Partitions 0 and 1 have been assigned to Consumer 1 of the consumer group consumer-group-application-1, while Partitions 2 and 3 have been assigned to Consumer 2, and Partition 4 has been assigned to Consumer 3. Messages from Partitions 0 and 1 are only received by Consumer 1, those from Partitions 2 and 3 are only received by Consumer 2, while messages from Partition 4 are only received by Consumer 3.

Some consumers will remain dormant as seen below if the number of customers exceeds the number of partitions of a topic. A consumer group typically contains the same number of consumers as partitions. When constructing the topic, we should add more partitions if we want more customers and higher throughput. If not, some of the consumers might continue to be inactive.


AI Impact on Last-Mile Delivery in Logistics & Supply Chain

AI Impact on Last-Mile Delivery in Logistics & Supply Chain

In today's fast-paced e-commerce environment, last-mile delivery represents a critical component of the logistics and supply chain process. As consumer expectations for speed and efficiency escalate, businesses are increasingly turning to Artificial Intelligence (AI)...

The Power of Augmented Reality on Print Media

The Power of Augmented Reality on Print Media

In the evolving retail landscape, the integration of technology into traditional media is opening up new avenues for customer engagement. One of the most exciting developments is the use of Mixed Reality (MR) and, more specifically, Augmented Reality (AR) in print...

Streamline Your Data Analysis with Septa

Streamline Your Data Analysis with Septa

Tired of the coding roadblocks hindering your data exploration? Septa offers a revolutionary solution: AI-powered analysis that empowers anyone, regardless of technical expertise, to unlock the value of their data. Forget the days of: Struggling with complex SQL...

Exploring Real-world Use Cases for WebAR in Various Industries

Exploring Real-world Use Cases for WebAR in Various Industries

Augmented Reality (AR) has evolved beyond gaming and is making significant strides in various industries. WebAR offers accessibility and versatility, opening doors to a multitude of real-world applications. In this blog post, we'll delve into the diverse use cases of...

Security Considerations in WebAR Development: Protecting User Privacy

Security Considerations in WebAR Development: Protecting User Privacy

User trust is foundational for the success of any technology. Ensuring robust security measures in WebAR development not only protects users but also fosters trust, encouraging broader adoption of AR experiences. WebAR applications may involve real-time communication...

Let’s discuss your project!

AI Impact on Last-Mile Delivery in Logistics & Supply Chain

AI Impact on Last-Mile Delivery in Logistics & Supply Chain

In today's fast-paced e-commerce environment, last-mile delivery represents a critical component of the logistics and supply chain process. As consumer expectations for speed and efficiency escalate, businesses are increasingly turning to Artificial Intelligence (AI)...