This is surprisingly basic knowledge for ending up on the front page.
It’s a good intro, but I’d love to read more about when to know it’s time to replace my synchronous inter service http requests with a queue. What metrics should I consider and what are the trade offs. I’ve learned some answers to this question over time, but these guys are theoretically message queue experts. I’d love to learn about more things to look out for.
There are also different types of queues/exchanges and this is critical depending on the types of consumer or consumers you have. Should I use direct, fan out, etc?
The next interesting question is when should I use a stream instead of a queue, which RabbitMQ also supports.
My advice, having just migrated a set of message queues and streams from AWS(AvtiveMQ) to RabbitMQ is think long and hard before you add one. They become a black box of sorts and are way harder to debug than simple HTTP requests.
Also, as others have pointed out, there are other important use cases for queues which come way before microservice comms. Async processing to free up servers is one. I’m surprised none of these were mentioned.
> This is surprisingly basic knowledge for ending up on the front page.
Nothing wrong with that! Hacker News has a large audience of all skill levels. Well written explainers are always good to share, even for basic concepts.
Agree! In fact, I would appreciate more well written articles explaining basic concepts on the front page of Hacker News. It is always good to revisit some basic concepts, but it is even better to relearn them. I am surprised by how often I realize that my definition of a concept is wrong or just superficial.
For me, I've realized I often cannot possibly learn something if I can't compare it to something prior first.
In this case, as another user mentioned, the decoupling use case is a great one. Instead of two processes/API directly talking, having an intermediate "buffer" process/API can save you headache
The concept of connascence, and not coupling is what I find more useful for trade off analysis.
Synchronous connascence means that you only have a single architectural quanta under Neil Ford’s terminology.
As Ford is less religious and more respectful of real world trade offs, I find his writings more useful for real world problems.
I encourage people to check his books out and see if it is useful. It was always hard to mention connascence as it has a reputation of being ivory tower architect jargon, but in a distributed system world it is very pragmatic.
> but I’d love to read more about when to know it’s time to replace my synchronous inter service http requests with a queue. What metrics should I consider and what are the trade offs. I’ve learned some answers to this question over time, but these guys are theoretically message queue experts. I’d love to learn about more things to look out for.
Not OP but I have some background on this.
An Erlang loss system is like a set of phone lines. Imagine a special call center where you have N operators, each of which takes calls, talks for some time (serving the customer) and hungs up. Unlike many call centers, however, they don’t keep you in line. Therefore, if all operators are busy the system hungs up and you have to explicitly call again. This is somewhat similar to a server with N threads.
Let's assume N=3.
Under common mathematical assumptions (constant arrival rate, time between arrivals modeled by a Poisson distribution, exponential service time) you can define:
1) “traffic intensity” (rho) has the ratio between arrival time and service time (intuitively, how “heavy” arrivals are with respect to “departures”)
2) the blocking probability is given by the Erlang B formula (sorry, not easy to write here) for parameters N (number of threads) and rho (traffic intensity). Basically, if traffic intensity = 1 (arrival rate = service rate), the blocking probability is 6.25%. If service rate is twice the arrival rate, this drops to 1% approximately. If service rate is 1/10 of the arrival rate, the blocking probability is 73.3%.
I will try to write down part 2 when I find some time.
EDIT - Adding part 2
So, let's add a buffer. We said we have three threads, right? Let's say the system can handle up to 6 requests before dropping, 1 processed by each thread plus an additional 3 buffered requests.
Under the same distribution assumptions, this is known as a M/M/3/6 queue.
Some math crunching under the previous service and arrival rate scenarios:
- if service = arrival time, blocking probability drops to 2%. Of course there is now a non-zero wait probability (close to 9%).
- if service = twice the arrival time, blocking probability is 0.006% and there is a 1% wait probability.
- if service = 1/10 of the arrival time, blocking probability is 70%, waiting probability is 29%.
This means that a buffer reduces request drops due to busy resources, but also introduces a waiting probability. Pretty obvious. Another obvious thing is that you need additional memory for that queue length. Assuming queue length = 3, and 1 KB messages, you need 3 KB of additional memory.
A less obvious thing is that you are adding a new component.
Assuming "in series" behavior, i.e. requests cannot be processed when the buffer system is down, this decreases overall availability if the queue is not properly sized. What I mean is that, if the system crashes when more than 4 KB of memory are used by the process, but you allow queue sizes up to 3 (3 KB + 3 KB = 6 KB), availability is not 100%, because in some cases the system accepts more requests than it can actually handle.
An even less obvious thing is that things, in terms of availability, change if you consider server and buffer as having distinct "size" (memory) thresholds. Things get even more complicated if server and buffer are connected by a link which itself doesn't have 100% availability, because you also have to take into account the link unavailability.
I think the article would be a little bit more useful to non-beginners if it included an update on the modern landscape of MQs. Are people still using apache kafka lol?
I’ve been thinking that defaulting to durable execution over lower‑level primitives like queues makes sense a lot of the time, what do you think?
A lot of the "simple queue" use cases end up needing extra machinery like a transactional‑outbox pattern just to be reliable. Durable‑execution frameworks (DBOS/Temporal/etc.) give you retries, state, and consistency out of the box. Patterns like Sagas also tend to get stitched together on top of queues, but a DE workflow gives you the same guarantees with far less complexity.
The main tradeoff I can think of is latency: DE engines add overhead, so for very high throughput, huge fan‑out, or ultra‑low‑latency pipelines, a bare‑bones queue + custom consumers might still be better.
Curious where others draw the line between the two.
Highly biased opinion here since I'm the CEO of DBOS:
It'll be rare that the overhead actually has an effect, especially if you use a library like DBOS, which only adds a database write. You still have to write to and read from your queue, which is about as expensive as a database write/read.
While queues definitely play an important role in microservices architecture, I think it’s worth clarifying that they’re not unique to it. A queue can fit perfectly in a monolith depending on the use case. I regularly use queues for handling critical operations that might require retrying, for having better visibility into failed jobs, ensuring FIFO guarantees, and more. Queues are such a useful tool for building any resilient architecture that framing them as primarily a microservices concern might cause unnecessary confusion.
I work on PeopleSoft Enterprise Resource Planning applications - the "boring" back-office HR, Pay, Financials, Planning etc stuff.
The core architecture is late 80s - mid 90s. Couple of big architectural changes when internet/browsers and then mobile really hit. But fundamentally it's a very legacy / old school application. Lots of COBOL, if that helps calibrate :->
We use queues pervasively. It's PeopleSoft's preferred integration method for other external applications, but over the years a large number of internal plumbing is now via queues as well. PeopleSoft Integration Broker is kind of like an internal proprietary ESB. So understanding queues and messaging is key to my PeopleSoft Administrator teams wherever I go (basically sysadmins in service of PeopleSoft application:).
Recently, I also started using queues for integrating with legacy health care applications. Most of them run on-promise and they don't have incoming internet connection for security reasons. The strategy is to send a message to a queue. The consumer application uses short polling to process the messages and then it can call a webhook to share the status of the job. Do you also follow a similar approach?
If I understand it correctly, no; PeopleSoft is Legacy in some ways but it is actively developed and improved/maintained. The Peoplesoft Integration Broker is "modern-ish" from that perspective, and a proper middleware messaging system:
It'll do XML messages in somewhat proprietary format with other PeopleSoft applications, and "near-real-time" queues via web services with other applications in a fairly standardized way (WSDL etc). I think of PeopleSoft Integration Broker as a "mini, proprietary ESB", as inaccurate as it may be in details :).
Monoliths also have to scale to multiple servers eventually, so message queues are an important architectural component to understand regardless of the organization of your services.
The analogy could be: “Queues are the like the todos list of your team. The todo item (message) stays there until it is successfully completed. It can be handled by the producer (monolith) or it can be handled by someone else (microservices).”
After spending most of my career hacking on these systems, I feel like queues very quickly become a hammer and every entity quickly becomes a nail.
Just because you can keep two systems in complete sync doesn't mean you should. If you ever find yourself with more-or-less identical tables in two services you may have gone too far.
Eventually you find yourself backfilling downstream services due to minor domain or business logic changes and scaling is a problem again.
It’s a good intro, but I’d love to read more about when to know it’s time to replace my synchronous inter service http requests with a queue. What metrics should I consider and what are the trade offs. I’ve learned some answers to this question over time, but these guys are theoretically message queue experts. I’d love to learn about more things to look out for.
There are also different types of queues/exchanges and this is critical depending on the types of consumer or consumers you have. Should I use direct, fan out, etc?
The next interesting question is when should I use a stream instead of a queue, which RabbitMQ also supports.
My advice, having just migrated a set of message queues and streams from AWS(AvtiveMQ) to RabbitMQ is think long and hard before you add one. They become a black box of sorts and are way harder to debug than simple HTTP requests.
Also, as others have pointed out, there are other important use cases for queues which come way before microservice comms. Async processing to free up servers is one. I’m surprised none of these were mentioned.
Nothing wrong with that! Hacker News has a large audience of all skill levels. Well written explainers are always good to share, even for basic concepts.
In this case, as another user mentioned, the decoupling use case is a great one. Instead of two processes/API directly talking, having an intermediate "buffer" process/API can save you headache
The concept of connascence, and not coupling is what I find more useful for trade off analysis.
Synchronous connascence means that you only have a single architectural quanta under Neil Ford’s terminology.
As Ford is less religious and more respectful of real world trade offs, I find his writings more useful for real world problems.
I encourage people to check his books out and see if it is useful. It was always hard to mention connascence as it has a reputation of being ivory tower architect jargon, but in a distributed system world it is very pragmatic.
https://www.softprayog.in/programming/interprocess-communica...
Fun fact: IPC was introduced in "Colombus UNIX."
https://en.wikipedia.org/wiki/CB_UNIX
Not OP but I have some background on this.
An Erlang loss system is like a set of phone lines. Imagine a special call center where you have N operators, each of which takes calls, talks for some time (serving the customer) and hungs up. Unlike many call centers, however, they don’t keep you in line. Therefore, if all operators are busy the system hungs up and you have to explicitly call again. This is somewhat similar to a server with N threads.
Let's assume N=3.
Under common mathematical assumptions (constant arrival rate, time between arrivals modeled by a Poisson distribution, exponential service time) you can define:
1) “traffic intensity” (rho) has the ratio between arrival time and service time (intuitively, how “heavy” arrivals are with respect to “departures”)
2) the blocking probability is given by the Erlang B formula (sorry, not easy to write here) for parameters N (number of threads) and rho (traffic intensity). Basically, if traffic intensity = 1 (arrival rate = service rate), the blocking probability is 6.25%. If service rate is twice the arrival rate, this drops to 1% approximately. If service rate is 1/10 of the arrival rate, the blocking probability is 73.3%.
I will try to write down part 2 when I find some time.
EDIT - Adding part 2
So, let's add a buffer. We said we have three threads, right? Let's say the system can handle up to 6 requests before dropping, 1 processed by each thread plus an additional 3 buffered requests. Under the same distribution assumptions, this is known as a M/M/3/6 queue.
Some math crunching under the previous service and arrival rate scenarios:
- if service = arrival time, blocking probability drops to 2%. Of course there is now a non-zero wait probability (close to 9%).
- if service = twice the arrival time, blocking probability is 0.006% and there is a 1% wait probability.
- if service = 1/10 of the arrival time, blocking probability is 70%, waiting probability is 29%.
This means that a buffer reduces request drops due to busy resources, but also introduces a waiting probability. Pretty obvious. Another obvious thing is that you need additional memory for that queue length. Assuming queue length = 3, and 1 KB messages, you need 3 KB of additional memory.
A less obvious thing is that you are adding a new component. Assuming "in series" behavior, i.e. requests cannot be processed when the buffer system is down, this decreases overall availability if the queue is not properly sized. What I mean is that, if the system crashes when more than 4 KB of memory are used by the process, but you allow queue sizes up to 3 (3 KB + 3 KB = 6 KB), availability is not 100%, because in some cases the system accepts more requests than it can actually handle.
An even less obvious thing is that things, in terms of availability, change if you consider server and buffer as having distinct "size" (memory) thresholds. Things get even more complicated if server and buffer are connected by a link which itself doesn't have 100% availability, because you also have to take into account the link unavailability.
it is a fine enough article as it is though!
A lot of the "simple queue" use cases end up needing extra machinery like a transactional‑outbox pattern just to be reliable. Durable‑execution frameworks (DBOS/Temporal/etc.) give you retries, state, and consistency out of the box. Patterns like Sagas also tend to get stitched together on top of queues, but a DE workflow gives you the same guarantees with far less complexity.
The main tradeoff I can think of is latency: DE engines add overhead, so for very high throughput, huge fan‑out, or ultra‑low‑latency pipelines, a bare‑bones queue + custom consumers might still be better.
Curious where others draw the line between the two.
It'll be rare that the overhead actually has an effect, especially if you use a library like DBOS, which only adds a database write. You still have to write to and read from your queue, which is about as expensive as a database write/read.
I work on PeopleSoft Enterprise Resource Planning applications - the "boring" back-office HR, Pay, Financials, Planning etc stuff.
The core architecture is late 80s - mid 90s. Couple of big architectural changes when internet/browsers and then mobile really hit. But fundamentally it's a very legacy / old school application. Lots of COBOL, if that helps calibrate :->
We use queues pervasively. It's PeopleSoft's preferred integration method for other external applications, but over the years a large number of internal plumbing is now via queues as well. PeopleSoft Integration Broker is kind of like an internal proprietary ESB. So understanding queues and messaging is key to my PeopleSoft Administrator teams wherever I go (basically sysadmins in service of PeopleSoft application:).
https://docs.oracle.com/cd/E92519_02/pt856pbr3/eng/pt/tibr/c...
It'll do XML messages in somewhat proprietary format with other PeopleSoft applications, and "near-real-time" queues via web services with other applications in a fairly standardized way (WSDL etc). I think of PeopleSoft Integration Broker as a "mini, proprietary ESB", as inaccurate as it may be in details :).
Just because you can keep two systems in complete sync doesn't mean you should. If you ever find yourself with more-or-less identical tables in two services you may have gone too far.
Eventually you find yourself backfilling downstream services due to minor domain or business logic changes and scaling is a problem again.