Apache Kafka Tutorial for Beginners

What is Apache Kafka?

Apache Kafka is a powerful system for handling high-speed data streams in real time. It's widely used to build data pipelines and real-time applications that need to handle lots of data reliably.

Distributed: Runs across multiple machines for fault tolerance and scalability.
Event streaming platform: Publishes, stores, and lets you consume streams of events (messages).

Who created it?
Kafka was started at LinkedIn by Jay Kreps, Neha Narkhede, and Jun Rao, and became open source in 2011.

Why Kafka?

Traditional databases and message queues can't always keep up with big, fast-moving data. Kafka is built for:

Real-time analytics and dashboards
Event-driven architectures (microservices, etc.)
Log collection and metrics
Replaying data streams (audit, debugging)

Kafka vs RabbitMQ: Quick Comparison

Feature	Kafka	RabbitMQ
Model	Distributed event log	Classic message queue
Retention	Configurable (time/size)	Deleted after consumed
Message Replay	Yes, re-read old data	No
Throughput	Extremely high	Good for small bursts
Use Cases	Streaming, logs, analytics	Message passing, RPC

Core Concepts & Architecture

Event:

A record (message), usually a key-value pair, describing "something happened" (e.g. new signup).

Producer:

Sends (publishes) events/messages to Kafka topics.

Consumer:

Reads (consumes) events/messages from Kafka topics.

Topic:

A named feed/category for data — like a channel that producers write to and consumers read from.

Partition:

Each topic is split into partitions for scalability and speed. Each partition is an ordered, unchangeable sequence of events.

Consumer Group:

One or more consumers working together to read from a topic in parallel for scalability and redundancy.

Offset:

A unique number for each event within a partition. Kafka remembers what each consumer group has read.

Consumer Rebalancing:

When consumers join or leave a group, partition assignments shuffle automatically to spread work evenly.

Cluster:

A group of Kafka brokers (servers) working together to handle topics, partitions, and data.

Broker:

A single Kafka server; stores data and handles client requests.

Diagram: How Kafka Works

Getting Started Quickly: Kafka with Docker (KRaft Mode)

You can run Kafka instantly with a single command (no ZooKeeper needed):

docker run -d --name=kafka -p 9092:9092 apache/kafka:latest

-d: Detached/background mode
--name=kafka: Name this container
-p 9092:9092: Map port 9092
Uses latest Kafka image (default KRaft mode, so no ZooKeeper required!)

Check if Kafka is running with:

docker ps

Essential Kafka Commands (with Docker)

All commands below run inside the Kafka Docker container.

1. Create a Topic

docker exec -t kafka /opt/kafka/bin/kafka-topics.sh --create --topic demo-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1

2. List Topics

docker exec -t kafka /opt/kafka/bin/kafka-topics.sh --list --bootstrap-server localhost:9092

3. Describe a Topic

docker exec -t kafka /opt/kafka/bin/kafka-topics.sh --describe --topic demo-topic --bootstrap-server localhost:9092

4. Produce Messages (send data)

docker exec -it kafka /opt/kafka/bin/kafka-console-producer.sh --topic demo-topic --bootstrap-server localhost:9092

Type your messages, press Enter; Ctrl+C to exit.

5. Consume Messages (read data)

docker exec -it kafka /opt/kafka/bin/kafka-console-consumer.sh --topic demo-topic --from-beginning --bootstrap-server localhost:9092

Reads all events, even older ones. Ctrl+C to exit.

6. View Group Offsets

docker exec -it kafka /opt/kafka/bin/kafka-console-consumer.sh --topic demo-topic --group my-group --bootstrap-server localhost:9092
docker exec -t kafka /opt/kafka/bin/kafka-consumer-groups.sh --describe --group my-group --bootstrap-server localhost:9092

7. List Consumer Groups

docker exec -t kafka /opt/kafka/bin/kafka-consumer-groups.sh --list --bootstrap-server localhost:9092

8. Delete a Topic

docker exec -t kafka /opt/kafka/bin/kafka-topics.sh --delete --topic demo-topic --bootstrap-server localhost:9092

9. Stop & Remove the Kafka Container

docker rm -f kafka

Integrating Kafka with Spring Boot

Spring Boot makes Kafka integration easy using the spring-kafka library for both producers and consumers.

Kafka Producer Example (Spring Boot)

Send messages to a Kafka topic from your app.

Add Maven/Gradle dependencies

<dependency>
    <groupId>org.springframework.kafka</groupId>
    <artifactId>spring-kafka</artifactId>
</dependency>

// Producer service
import org.springframework.kafka.core.KafkaTemplate;
import org.springframework.stereotype.Service;

@Service
public class KafkaProducerService {
    private final KafkaTemplate<String, String> kafkaTemplate;

    public KafkaProducerService(KafkaTemplate<String, String> kafkaTemplate) {
        this.kafkaTemplate = kafkaTemplate;
    }

    public void sendMessage(String topic, String message) {
        kafkaTemplate.send(topic, message);
    }
}

Usage Example:

// Autowire and call from a REST endpoint, etc.
producerService.sendMessage("demo-topic", "Hello from Spring Boot!");

Kafka Consumer Example (Spring Boot)

Receive and process messages from a Kafka topic.

import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Service;

@Service
public class KafkaConsumerService {

    // Listen to topic with group "my-group"
    @KafkaListener(topics = "demo-topic", groupId = "my-group")
    public void consume(String message) {
        System.out.println("Consumed: " + message);
        // Add your processing logic here
    }
}

Serialization & Deserialization

Kafka handles sending (serializing) and receiving (deserializing) data—crucial for structured payloads like JSON or Avro.

Default: Spring Boot uses StringSerializer and StringDeserializer for strings.
Custom Objects:
Use JSON by specifying JsonSerializer and JsonDeserializer in your configuration.

application.yml example:

spring:
  kafka:
    consumer:
      bootstrap-servers: localhost:9092
      group-id: my-group
      value-deserializer: org.springframework.kafka.support.serializer.JsonDeserializer
      properties:
        spring.json.trusted.packages: "*"
    producer:
      bootstrap-servers: localhost:9092
      value-serializer: org.springframework.kafka.support.serializer.JsonSerializer

POJO message example:

public class UserEvent {
    private String userId;
    private String action;
    // getters, setters
}

Producer sends POJO:

kafkaTemplate.send("user-topic", new UserEvent("101", "signup"));

Consumer parses POJO:

@KafkaListener(topics = "user-topic", groupId = "my-group", containerFactory = "kafkaListenerContainerFactory")
public void consumeUserEvent(UserEvent event) {
    System.out.println(event.getUserId() + " -- " + event.getAction());
}

Set up a custom ConcurrentKafkaListenerContainerFactory bean in config for POJO support.

Error Handling: Retries & Dead-Letter Topics

Spring Kafka allows automatic retries and dead-letter topic (DLT) forwarding with annotations.

1. Retry Handling

Use the @RetryableTopic annotation:

import org.springframework.kafka.annotation.RetryableTopic;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Service;

@Service
public class RetryConsumerService {

    @RetryableTopic(
        attempts = 3, // Total tries = 1 + 2 retries
        backoff = @RetryableTopic.Backoff(delay = 2000, multiplier = 2)
    )
    @KafkaListener(topics = "transaction-topic", groupId = "my-group")
    public void consume(String msg) {
        if (msg.contains("fail")) throw new RuntimeException("Processing failed");
        System.out.println("Processed: " + msg);
    }
}

Total attempts: 3 (1 initial try + 2 retries).
Initial delay (delay) before FIRST retry: 2000 ms (2 seconds).
multiplier: After each failed retry, the delay is increased by this factor (exponential backoff).

What happens?

If consume() throws, Spring Kafka retries (with backoff).
After retries, failed messages go to a DLT.

2. Dead-Letter Topic (DLT)

Spring Kafka automatically publishes failed messages to a dedicated DLT topic (<original-topic>.DLT) if retries are exhausted.

To monitor or process failed messages:

@KafkaListener(topics = "transaction-topic.DLT", groupId = "my-group")
public void dltConsumer(String message) {
    // Alert, log, or remediate
    System.out.println("DLT message: " + message);
}

Note:
You don’t need special config for basic DLT; @RetryableTopic handles topic creation and routing.

These examples give you the foundation for integrating Kafka in Spring Boot, handling structured payloads, and building resilient consumers!

Kafka Streams: Overview & Integration with Spring Boot

What is Kafka Streams—and What Problem Does It Solve?

Kafka Streams is a Java library for building scalable, fault-tolerant, event-driven applications and microservices.
It enables real-time processing and transformation of data streams directly within your application code, using standard Kafka topics as input/output—no external clusters or frameworks required.

Problems Kafka Streams solves:

Real-time data transformation and enrichment (e.g., filtering, joining, aggregating messages)
Stateful stream processing (windowed counts, aggregations, sessionization)
Handling large, unbounded streams where traditional batch ETL tools are insufficient
Easily microservice-ready: stream processing logic runs in your app alongside producer/consumer logic
Leverages Kafka’s reliability and scalability

Integrating Kafka Streams with Spring Boot

Spring Boot supports Kafka Streams via the spring-kafka library.

Maven/Gradle Dependency:

<dependency>
    <groupId>org.springframework.kafka</groupId>
    <artifactId>spring-kafka</artifactId>
</dependency>

application.yml (basic):

spring:
  kafka:
    streams:
      application-id: my-streams-app
      bootstrap-servers: localhost:9092

Typical Steps:

Define your stream processing topology as a bean in your Spring Boot app (using StreamsBuilder)
Input and output data via Kafka topics

Kafka Streams in Action: Method Examples

Java Example:
Below are patterns commonly used when processing Kafka streams. This example assumes a topic of records in JSON form (e.g., user events).

1. Filter: `filter()`, `filterNot()`

KStream<String, UserEvent> stream = builder.stream("input-topic");

// Keep only events where type is "CLICK"
KStream<String, UserEvent> clicks = stream.filter((k, v) -> "CLICK".equals(v.getType()));

// Remove LOGIN events
KStream<String, UserEvent> notLogins = stream.filterNot((k, v) -> "LOGIN".equals(v.getType()));

2. Map & Value Mapping: `map()`, `mapValues()`

// map() lets you change both key & value:
KStream<String, String> idAndType = stream
    .map((key, event) -> KeyValue.pair(event.getUserId(), event.getType()));

// mapValues() changes only the value (key unchanged)
KStream<String, Integer> userLengths = stream
    .mapValues(event -> event.getUserId().length());

3. FlatMap & FlatMapValues

// flatMap: each record can become 0 or more new records
KStream<String, String> exploded = stream
    .flatMap((key, event) -> {
        List<KeyValue<String, String>> result = new ArrayList<>();
        for (String action : event.getActions()) {
            result.add(KeyValue.pair(event.getUserId(), action));
        }
        return result;
    });

// flatMapValues: changes only value to 0 or more values, key stays the same
KStream<String, String> splitWords = stream
    .flatMapValues(event -> Arrays.asList(event.getType().split("\\s+")));

4. Branch

// Split stream into multiple based on predicates
KStream<String, UserEvent>[] branches = stream.branch(
    (k, v) -> "CLICK".equals(v.getType()),         // branch[0]: clicks
    (k, v) -> "LOGIN".equals(v.getType()),         // branch[1]: logins
    (k, v) -> true                                 // branch[2]: everything else
);

5. GroupBy, Aggregate, and Count

// Group events by userId
KGroupedStream<String, UserEvent> grouped = stream.groupBy((key, event) -> event.getUserId());

// Count events per user
KTable<String, Long> userCounts = grouped.count();

// Aggregate (e.g., collect all actions by a user into a list):
KTable<String, List<String>> actionAggregates = grouped.aggregate(
    ArrayList::new, // initializer
    (userId, event, list) -> { list.add(event.getType()); return list; }, // aggregator
    Materialized.<String, List<String>, KeyValueStore<Bytes, byte[]>>as("user-action-store")
);

Putting it together with Spring Boot

Kafka Streams Configuration Bean Example (Spring Boot):

import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.kstream.*;
import org.springframework.context.annotation.Bean;
import org.springframework.kafka.annotation.EnableKafkaStreams;
import org.springframework.context.annotation.Configuration;

@Configuration
@EnableKafkaStreams
public class KafkaStreamsConfig {

    @Bean
    public KStream<String, UserEvent> kStream(StreamsBuilder builder) {
        KStream<String, UserEvent> stream = builder.stream("input-topic");

        KStream<String, UserEvent> onlyClicks = stream.filter(
            (k, v) -> "CLICK".equals(v.getType())
        );

        onlyClicks.to("clicks-topic");
        return stream;
    }
}

Spring will auto-create the required Kafka Streams infrastructure.
Use @EnableKafkaStreams and define topologies as beans.

With Kafka Streams and Spring Boot, you can transform, aggregate, and analyze streaming data in real time—using powerful functional patterns right in your Java code!

Command Palette