Apache Kafka Tutorial for Beginners
👋 Hey there! I’m Shohanur Rahman!
I’m a backend developer with over 5.5 years of experience in building scalable and efficient web applications. My work focuses on Java, Spring Boot, and microservices architecture, where I love designing robust API solutions and creating secure middleware for complex integrations.
💼 What I Do Backend Development: Expert in Spring Boot, Spring Cloud, and Spring WebFlux, I create high-performance microservices that drive seamless user experiences. Cloud & DevOps: AWS enthusiast, skilled in using EC2, S3, RDS, and Docker to design scalable and reliable cloud infrastructures. Digital Security: Passionate about securing applications with OAuth2, Keycloak, and digital signatures for data integrity and privacy. 🚀 Current Projects I’m currently working on API integrations with Spring Cloud Gateway and designing an e-invoicing middleware. My projects often involve asynchronous processing, digital signature implementations, and ensuring high standards of security.
📝 Why I Write I enjoy sharing what I’ve learned through blog posts, covering everything from backend design to API security and cloud best practices. Check out my posts if you’re into backend dev, cloud tech, or digital security!
What is Apache Kafka?
Apache Kafka is a powerful system for handling high-speed data streams in real time. It's widely used to build data pipelines and real-time applications that need to handle lots of data reliably.
Distributed: Runs across multiple machines for fault tolerance and scalability.
Event streaming platform: Publishes, stores, and lets you consume streams of events (messages).
Who created it?
Kafka was started at LinkedIn by Jay Kreps, Neha Narkhede, and Jun Rao, and became open source in 2011.
Why Kafka?
Traditional databases and message queues can't always keep up with big, fast-moving data. Kafka is built for:
Real-time analytics and dashboards
Event-driven architectures (microservices, etc.)
Log collection and metrics
Replaying data streams (audit, debugging)
Kafka vs RabbitMQ: Quick Comparison
| Feature | Kafka | RabbitMQ |
| Model | Distributed event log | Classic message queue |
| Retention | Configurable (time/size) | Deleted after consumed |
| Message Replay | Yes, re-read old data | No |
| Throughput | Extremely high | Good for small bursts |
| Use Cases | Streaming, logs, analytics | Message passing, RPC |
Core Concepts & Architecture
Event:
A record (message), usually a key-value pair, describing "something happened" (e.g. new signup).
Producer:
Sends (publishes) events/messages to Kafka topics.
Consumer:
Reads (consumes) events/messages from Kafka topics.
Topic:
A named feed/category for data — like a channel that producers write to and consumers read from.
Partition:
Each topic is split into partitions for scalability and speed. Each partition is an ordered, unchangeable sequence of events.
Consumer Group:
One or more consumers working together to read from a topic in parallel for scalability and redundancy.
Offset:
A unique number for each event within a partition. Kafka remembers what each consumer group has read.
Consumer Rebalancing:
When consumers join or leave a group, partition assignments shuffle automatically to spread work evenly.
Cluster:
A group of Kafka brokers (servers) working together to handle topics, partitions, and data.
Broker:
A single Kafka server; stores data and handles client requests.
Diagram: How Kafka Works
Getting Started Quickly: Kafka with Docker (KRaft Mode)
You can run Kafka instantly with a single command (no ZooKeeper needed):
docker run -d --name=kafka -p 9092:9092 apache/kafka:latest
-d: Detached/background mode--name=kafka: Name this container-p 9092:9092: Map port 9092Uses latest Kafka image (default KRaft mode, so no ZooKeeper required!)
Check if Kafka is running with:
docker ps
Essential Kafka Commands (with Docker)
All commands below run inside the Kafka Docker container.
1. Create a Topic
docker exec -t kafka /opt/kafka/bin/kafka-topics.sh --create --topic demo-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
2. List Topics
docker exec -t kafka /opt/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092
3. Describe a Topic
docker exec -t kafka /opt/kafka/bin/kafka-topics.sh --describe --topic demo-topic --bootstrap-server localhost:9092
4. Produce Messages (send data)
docker exec -it kafka /opt/kafka/bin/kafka-console-producer.sh --topic demo-topic --bootstrap-server localhost:9092
Type your messages, press Enter; Ctrl+C to exit.
5. Consume Messages (read data)
docker exec -it kafka /opt/kafka/bin/kafka-console-consumer.sh --topic demo-topic --from-beginning --bootstrap-server localhost:9092
Reads all events, even older ones. Ctrl+C to exit.
6. View Group Offsets
docker exec -it kafka /opt/kafka/bin/kafka-console-consumer.sh --topic demo-topic --group my-group --bootstrap-server localhost:9092
docker exec -t kafka /opt/kafka/bin/kafka-consumer-groups.sh --describe --group my-group --bootstrap-server localhost:9092
7. List Consumer Groups
docker exec -t kafka /opt/kafka/bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092
8. Delete a Topic
docker exec -t kafka /opt/kafka/bin/kafka-topics.sh --delete --topic demo-topic --bootstrap-server localhost:9092
9. Stop & Remove the Kafka Container
docker rm -f kafka
Integrating Kafka with Spring Boot
Spring Boot makes Kafka integration easy using the spring-kafka library for both producers and consumers.
Kafka Producer Example (Spring Boot)
Send messages to a Kafka topic from your app.
Add Maven/Gradle dependencies
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
// Producer service
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.stereotype.Service;
@Service
public class KafkaProducerService {
private final KafkaTemplate<String, String> kafkaTemplate;
public KafkaProducerService(KafkaTemplate<String, String> kafkaTemplate) {
this.kafkaTemplate = kafkaTemplate;
}
public void sendMessage(String topic, String message) {
kafkaTemplate.send(topic, message);
}
}
Usage Example:
// Autowire and call from a REST endpoint, etc.
producerService.sendMessage("demo-topic", "Hello from Spring Boot!");
Kafka Consumer Example (Spring Boot)
Receive and process messages from a Kafka topic.
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Service;
@Service
public class KafkaConsumerService {
// Listen to topic with group "my-group"
@KafkaListener(topics = "demo-topic", groupId = "my-group")
public void consume(String message) {
System.out.println("Consumed: " + message);
// Add your processing logic here
}
}
Serialization & Deserialization
Kafka handles sending (serializing) and receiving (deserializing) data—crucial for structured payloads like JSON or Avro.
Default: Spring Boot uses StringSerializer and StringDeserializer for strings.
Custom Objects:
Use JSON by specifying JsonSerializer and JsonDeserializer in your configuration.
application.yml example:
spring:
kafka:
consumer:
bootstrap-servers: localhost:9092
group-id: my-group
value-deserializer: org.springframework.kafka.support.serializer.JsonDeserializer
properties:
spring.json.trusted.packages: "*"
producer:
bootstrap-servers: localhost:9092
value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
POJO message example:
public class UserEvent {
private String userId;
private String action;
// getters, setters
}
Producer sends POJO:
kafkaTemplate.send("user-topic", new UserEvent("101", "signup"));
Consumer parses POJO:
@KafkaListener(topics = "user-topic", groupId = "my-group", containerFactory = "kafkaListenerContainerFactory")
public void consumeUserEvent(UserEvent event) {
System.out.println(event.getUserId() + " -- " + event.getAction());
}
Set up a custom ConcurrentKafkaListenerContainerFactory bean in config for POJO support.
Error Handling: Retries & Dead-Letter Topics
Spring Kafka allows automatic retries and dead-letter topic (DLT) forwarding with annotations.
1. Retry Handling
Use the @RetryableTopic annotation:
import org.springframework.kafka.annotation.RetryableTopic;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Service;
@Service
public class RetryConsumerService {
@RetryableTopic(
attempts = 3, // Total tries = 1 + 2 retries
backoff = @RetryableTopic.Backoff(delay = 2000, multiplier = 2)
)
@KafkaListener(topics = "transaction-topic", groupId = "my-group")
public void consume(String msg) {
if (msg.contains("fail")) throw new RuntimeException("Processing failed");
System.out.println("Processed: " + msg);
}
}
Total attempts: 3 (1 initial try + 2 retries).
Initial delay (
delay) before FIRST retry: 2000 ms (2 seconds).multiplier: After each failed retry, the delay is increased by this factor (exponential backoff).
What happens?
If
consume()throws, Spring Kafka retries (with backoff).After retries, failed messages go to a DLT.
2. Dead-Letter Topic (DLT)
Spring Kafka automatically publishes failed messages to a dedicated DLT topic (<original-topic>.DLT) if retries are exhausted.
To monitor or process failed messages:
@KafkaListener(topics = "transaction-topic.DLT", groupId = "my-group")
public void dltConsumer(String message) {
// Alert, log, or remediate
System.out.println("DLT message: " + message);
}
Note:
You don’t need special config for basic DLT; @RetryableTopic handles topic creation and routing.
These examples give you the foundation for integrating Kafka in Spring Boot, handling structured payloads, and building resilient consumers!
Kafka Streams: Overview & Integration with Spring Boot
What is Kafka Streams—and What Problem Does It Solve?
Kafka Streams is a Java library for building scalable, fault-tolerant, event-driven applications and microservices.
It enables real-time processing and transformation of data streams directly within your application code, using standard Kafka topics as input/output—no external clusters or frameworks required.
Problems Kafka Streams solves:
Real-time data transformation and enrichment (e.g., filtering, joining, aggregating messages)
Stateful stream processing (windowed counts, aggregations, sessionization)
Handling large, unbounded streams where traditional batch ETL tools are insufficient
Easily microservice-ready: stream processing logic runs in your app alongside producer/consumer logic
Leverages Kafka’s reliability and scalability
Integrating Kafka Streams with Spring Boot
Spring Boot supports Kafka Streams via the spring-kafka library.
Maven/Gradle Dependency:
<dependency>
<groupId>org.springframework.kafka</groupId>
<artifactId>spring-kafka</artifactId>
</dependency>
application.yml (basic):
spring:
kafka:
streams:
application-id: my-streams-app
bootstrap-servers: localhost:9092
Typical Steps:
Define your stream processing topology as a bean in your Spring Boot app (using
StreamsBuilder)Input and output data via Kafka topics
Kafka Streams in Action: Method Examples
Java Example:
Below are patterns commonly used when processing Kafka streams. This example assumes a topic of records in JSON form (e.g., user events).
1. Filter: filter(), filterNot()
KStream<String, UserEvent> stream = builder.stream("input-topic");
// Keep only events where type is "CLICK"
KStream<String, UserEvent> clicks = stream.filter((k, v) -> "CLICK".equals(v.getType()));
// Remove LOGIN events
KStream<String, UserEvent> notLogins = stream.filterNot((k, v) -> "LOGIN".equals(v.getType()));
2. Map & Value Mapping: map(), mapValues()
// map() lets you change both key & value:
KStream<String, String> idAndType = stream
.map((key, event) -> KeyValue.pair(event.getUserId(), event.getType()));
// mapValues() changes only the value (key unchanged)
KStream<String, Integer> userLengths = stream
.mapValues(event -> event.getUserId().length());
3. FlatMap & FlatMapValues
// flatMap: each record can become 0 or more new records
KStream<String, String> exploded = stream
.flatMap((key, event) -> {
List<KeyValue<String, String>> result = new ArrayList<>();
for (String action : event.getActions()) {
result.add(KeyValue.pair(event.getUserId(), action));
}
return result;
});
// flatMapValues: changes only value to 0 or more values, key stays the same
KStream<String, String> splitWords = stream
.flatMapValues(event -> Arrays.asList(event.getType().split("\\s+")));
4. Branch
// Split stream into multiple based on predicates
KStream<String, UserEvent>[] branches = stream.branch(
(k, v) -> "CLICK".equals(v.getType()), // branch[0]: clicks
(k, v) -> "LOGIN".equals(v.getType()), // branch[1]: logins
(k, v) -> true // branch[2]: everything else
);
5. GroupBy, Aggregate, and Count
// Group events by userId
KGroupedStream<String, UserEvent> grouped = stream.groupBy((key, event) -> event.getUserId());
// Count events per user
KTable<String, Long> userCounts = grouped.count();
// Aggregate (e.g., collect all actions by a user into a list):
KTable<String, List<String>> actionAggregates = grouped.aggregate(
ArrayList::new, // initializer
(userId, event, list) -> { list.add(event.getType()); return list; }, // aggregator
Materialized.<String, List<String>, KeyValueStore<Bytes, byte[]>>as("user-action-store")
);
Putting it together with Spring Boot
Kafka Streams Configuration Bean Example (Spring Boot):
import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.kstream.*;
import org.springframework.context.annotation.Bean;
import org.springframework.kafka.annotation.EnableKafkaStreams;
import org.springframework.context.annotation.Configuration;
@Configuration
@EnableKafkaStreams
public class KafkaStreamsConfig {
@Bean
public KStream<String, UserEvent> kStream(StreamsBuilder builder) {
KStream<String, UserEvent> stream = builder.stream("input-topic");
KStream<String, UserEvent> onlyClicks = stream.filter(
(k, v) -> "CLICK".equals(v.getType())
);
onlyClicks.to("clicks-topic");
return stream;
}
}
Spring will auto-create the required Kafka Streams infrastructure.
Use
@EnableKafkaStreamsand define topologies as beans.
With Kafka Streams and Spring Boot, you can transform, aggregate, and analyze streaming data in real time—using powerful functional patterns right in your Java code!