Whenever we want to integrate message brokers into our application which allows us to scale easily and connect our system in an asynchronous fashion, there are many message brokers which can make the list from which you are made to choose one, like:
Each of these message brokers have their own list of pros and cons but the most challenging options are the first two, RabbitMQ and Apache Kafka. In this lesson, we will list down points which can help to narrow down the decision of going with one over other. Finally, it is worth pointing out that none of these is better than another in all use-cases and it completely depends on what you want to achieve, so there is no one right answer!
We will start with a simple introduction of these tools.
As we said in this lesson, Apache Kafka is a distributed, fault-tolerant, horizontally-scalable, commit log. This means that Kafka can perform a divide and rule term very well, it can replicate your data to ensure availability and is highly scalable in the sense that you can include new servers at runtime to increase its capacity to manage more messages.
RabbitMQ is a more general-purpose and simpler to use message broker which itself keeps record about what messages have been consumed by the client and persist the other one. Even if for some reason RabbitMQ server goes down, you can be sure that the messages currently present on queues have been stored on the Filesystem so that when RabbitMQ comes back up again, those messages can be processed by consumers in a consistent manner.
Superpower: Apache Kafka
Kafka’s main superpower is that it is can be used as a queue system but that is not what is limited to. Kafka is something more like a circular buffer that can scale as much as a disk on the machine on the cluster, and thus allows us to be able to re-read messages. This can be done by the client without having to depend on Kafka cluster as it is completely client’s responsibility to note the message metadata it is currently reading and it can revisit Kafka later in a specified interval to read the same message again.
Please note that the time in which this message can be re-read is limited and can be configured in Kafka configuration. So, once that time is over, there is no way a client can read an older message ever again.
RabbitMQ’s main superpower is that it is simply scalable, is a high-performant queuing system which has very well-defined consistency rules, and ability to create many types of message exchange models. For example, there are three types of exchange you can create in RabbitMQ:
- Direct Exchange: One to one exchange of topic
- Topic Exchange: A topic is defined on which various producers can publish a message and various consumers can bind themselves to listen on that topic, so each one of them receives the message which is sent to this topic.
- Fanout exchange: This is more strict than topic exchange as when a message is published on a fanout exchange, all consumers which are connected to queues which binds itself to the fanout exchange will receive the message.
Already noticed the difference between RabbitMQ and Kafka? The difference is, if a consumer is not connected to a fanout exchange in RabbitMQ when a message was published, it will be lost because other consumers have consumed the message, but this doesn’t happen in Apache Kafka as any consumer can read any message as they maintain their own cursor.
RabbitMQ is broker-centric
A good broker is someone who guarantees the work it takes upon itself and that is what RabbitMQ is good at. It is tilted towards delivery guarantees between producers and consumers, with transient preferred over durable messages.
RabbitMQ uses the broker itself to manage the state of a message and making sure that each message is delivered to each entitled consumer.
RabbitMQ presumes that consumers are mostly online.
Kafka is producer-centric
Apache Kafka is producer-centric as it is completely based around partitioning and a stream of event packets containing data and transforming them into durable message brokers with cursors, supporting batch consumers that may be offline, or online consumers that want messages at low latency.
Kafka makes sure that the message remains safe until a specified period of time by replicating the message on its nodes in the cluster and maintaining a consistent state.
So, Kafka doesn’t presume that any of its consumers are mostly online and nor it cares.
With RabbitMQ, the order of publishing is managed consistently and consumers will receive the message in the published order itself. On the other side, Kafka doesn’t do so as it presumes that published messages are heavy in nature so consumers are slow and can send messages in any order, so it doesn’t manage the order in its own as well. Though, we can set up a similar topology to manage the order in Kafka using the consistent hash exchange or sharding plugin., or even more kinds of topologies.
The complete task managed by Apache Kafka is to act like a “shock absorber” between the continuous flow of events and the consumers out of which some are online and others can be offline – only batch consuming on an hourly or even daily basis.
In this lesson, we studied the major differences (and similarities too) between Apache Kafka and RabbitMQ. In some environments, both have shown extraordinary performance like RabbitMQ consume millions of message per second and Kafka has consumed several millions of message per second. The main architectural difference is that RabbitMQ manages its messages almost in-memory and so, uses a big cluster (30+ nodes), whereas Kafka actually makes use of the powers of sequential disk I/O operations and requires less hardware.
Again, the usage of each of them still depends completely on the use-case in an application. Happy messaging !