Kafka for Kids
Technology is Simple. Any tools that we use today in Industry solves complex problems and intends to do so in the simplest of ways. All computer science buzzwords like caching, queueing, etc are simple ideas that emerge out of creative thinking. Each of these ideas aims to address one fundamental question:
“How can we Optimize further?”
Today we talk about one such technology.
Meet Harry. Harry is a genius of Magic. Harry has three friends Tom, Jerry and Scooby. They all want to learn Magic from Harry. Tom wants to learn magic to organize a magic show. Jerry wants to study magic to protect himself and Scooby so that he can train others. Each of Harry’s friends has a different goal but needs the same skill i.e. Magic. Traditionally, Harry would teach Tom first. Once he completes coaching Tom, he prepares Jerry next and finally move on to Scooby.
Unfortunately, this takes up a lot of Harry’s time 😟. At the same time, Harry also has a lot of other more important work like studying, eating, playing, etc. Now Harry thinks: “How can I save my time ?”. In other words, “How can I optimize my time ?” 🤔
Harry comes up with a solution 🤓. He writes a book on the subject. He puts the book in Library with one Rule: No one can take the book from the Library. Anyone can make a replica of the book and do anything with the copy. The original book remains unchanged.
The librarian, happy to help, finds a Cabinet for Harry’s book and puts the book in that cabinet.
Result? Harry can use his time in studying new magic tricks while his friends are benefiting from his book at the same time.
It not just saved Harry a lot of time, also Harry need not bother about the book’s maintenance, security etc. The library does that for him for free.
Many applications are in a similar situation as Harry. They want to achieve a lot and that too in the least amount of time. For example: Your food delivery application not just places your order but also sends you a mobile notification for the same. Your online shopping application, whenever you buy something new, generates so many recommendations for the future. The primary focus for these applications is to place your order. Yet for better user experience it needs to perform the additional task of sending notifications, generating recommendations, etc.. What if your application could have a library that helps in all these trivial tasks? Enters Kafka 😎
Mapping the previous section analogy to Kafka Terminology:
- Harry is analogous to any application which has a lot of work and is looking for someone to off-load some of the work to save time: Harry is the Producer.
- The Book is the Data using which anyone can fulfill the requirements. In this case, having a magic show, teaching magic to others, learning magic for self-defense all activities require the data — Book of Magic. The book is the Record and consists of Data that you want to communicate between two systems — the one having the data and one in need of that data to fulfill business needs.
- This library, which safely keeps the book, is the Kafka. The library can have many buildings at different locations. Similarly, Kafka Cluster is composed of many Brokers i.e. servers.
- The Library keeps the book in a Book cupboard. This cupboard is Topic. Usually, we make a topic for the same type of data. In this case, we will have a separate cabinet for cooking books and magic books. We can have any number of cupboards in the library (restricted by the space in the library). Similarly, we can have any number of topics in Kafka.
- Every book in the library is copied and kept at different locations to safeguard against damage to the original copy. In Kafka, you specify Replication for a Topic indicating how many copies of data you want.
- All of Harry’s friends who came to use the books are Consumers. Now, as per rule, none of them can take the book. However, they can take a photo(copy) of the book and leave to use it in any way they want, like reading, selling, or sending it to their friend. Harry has no control over who will use his book and in what way. In the same way, the producer has no say in the consumer and its processing logic.
- Now, why did Harry choose the library to keep his book and not keep the book with himself?
- Harry did not want to be involved in any transaction related to the book. Nobody could disturb him to give the book. Anybody who wants the book will contact the Library. Hence saving Harry from work. He can now focus on more important work. In computer jargon, this is called the Decoupling of the Data Pipelines.
- Today Harry wrote one book. Soon he can have multiple books in the same or different subjects. Harry neither has the space nor the expertise to handle such a large collection of books. Hence he takes the help of a library to manage such a large number of books. In this case, Kafka is an expert in handling large data and can handle an increase in load (increase in consumer/producer/records). It provides a distributed and scalable solution.
- Harry does not want to take the operational overheads like damage to books, no space to keep new books etc. The library always keeps a backup of the books, which it uses in case the original is damaged. In other words, the Kafka replicates the data in multiple servers to provide fault tolerance and reliability.
- If Harry had himself taken the work of teaching magic to all his friends, he would have to do a lot of work and that too one at a time. Also, he won’t be able to do more important work. The library is helping him in multitasking. Kafka, in the same way, helps in parallel processing your data and offloads the producer application.
- Just like every Library has an administrator who looks after the multiple libraries, there is an admin in Kafka also. The admin makes crucial decisions like what all cabinets can be there, what to do if a library needs maintenance, who will take the books of that specific library. Kafka has zookeepers to act on such situations like topic creation, broker unavailability etc.
To summarize, Kafka is a tool to decompose your application pipelines. An application, called Producer, puts data in Kafka. Another application picks the data and use it to achieve business requirements. Any number of applications can use this data. Kafka is just the container holding the data and facilitating simple and secured storage with performance benefits.
In this article, we visited a very basic design of Kafka. Of course, Kafka does a lot more than what we saw. There are a lot of missing pieces which we didn’t cover in this article. Keep watching this space for more content. Till then, Happy reading.