AxoSyslog internals: flow control, window size, queues, and batching

Your AxoSyslog/syslog-ng pipeline comfortably handles several thousand events per second. Throughput looks healthy. Then someone bumps max-connections() from 10 to 300 to absorb a fleet expansion, and the same pipeline drops to a fraction of its previous throughput. Nothing crashed, yet every batch now sits idle for hundreds of milliseconds before it ships. The culprit is rarely the network or the destination: it's the silent interaction between log-iw-size(), batching, and the in-application ACK chain.

The Message delivery guarantees in security data pipelines post explained the theory of ACK chains, flow control, backpressure, in-application acknowledgement, and the "safe place" concept that lets a disk buffer split the chain into smaller parts. This post is the practitioner's follow-up. It walks through how AxoSyslog actually implements those ideas, why a single misconfigured option can cap throughput, and how the knobs (log-iw-size(), flow-control, batch-lines(), batch-timeout(), disk-buffer(), batch-idle-timeout()) interact under the hood.

The log path

A log path in AxoSyslog connects source drivers to destination drivers, with optional processing steps in between:

source drivers read data from external sources (files, network sockets, system journals, and so on) and produce messages.
destination drivers send data to external destinations (files, network sockets, HTTP endpoints, OpenTelemetry collectors, and so on).
Processing steps (filter, rewrite, parser, and filterx blocks) transform or route messages as they travel through the log path.

Routing logic is expressed with log, channel, and if-else constructs, with processing steps acting as routing conditions. In an if-else block, a matching filter sends the message through the if branch; a non-matching filter reverts the message and passes it to the else branch.

In-application ACK

Theory recap in short: an end-to-end ACK chain only holds if every node participates, including every processing step inside a node. A naive implementation that ACKs as soon as one step hands the message to the next step breaks the chain immediately. See the in-application acknowledgement section of the previous post for the longer explanation.

AxoSyslog has full in-application ACK tracking. Every message carries an ownership reference through the pipeline, so the chain stays intact through filters, rewrites, parsers, branching, and multi-destination fan-out. The ACK logic handles both individual ACKs and batched (consecutive) ACKs, depending on what the driver requires. Filter drops are ACKed immediately, because a drop is intentional, complete processing.

This in-application ACK is what makes everything else in this post possible: flow control, backpressure, batching, and disk buffering all depend on knowing exactly which messages are still in flight.

Flow control and `log-iw-size()`

The source driver keeps a counter of how many messages it has emitted but not yet seen an ACK for. The upper bound on that counter is log-iw-size(), configured per source driver. When the counter reaches log-iw-size(), the driver stops reading from the underlying source until ACKs free up window slots.

What happens at the log-iw-size() limit depends on flow-control:

With flow-control on, the source stops reading. Backpressure propagates upstream to the previous hop.
With flow-control off, when a destination queue is full the message is dropped. The drop is ACKed immediately, so the source window slot frees up and reading continues regardless of destination speed. You trade backpressure for data loss.

Multi-connection sources

For source drivers like network() that accept multiple TCP connections, log-iw-size() is split evenly across connections. Each active connection gets log-iw-size() / max-connections() window slots, so the per-connection in-flight limit shrinks as max-connections() grows.

This static split wastes window slots when the actual number of connections is significantly fewer than max-connections(). dynamic-window-size() distributes the window only across currently active connections. If max-connections() is 300 but only 10 are active, each active connection gets dynamic-window-size() / 10 slots instead of log-iw-size() / 300. As connections open and close, the dynamic window redistributes automatically.

How backpressure builds

A slow or unavailable destination causes messages to pile up in the destination queue. A full queue is not consumed, so ACKs stop flowing back, un-ACKed messages accumulate, the source hits log-iw-size(), and with flow-control on, it stops reading. The pause propagates upstream to the previous hop, which eventually hits its own limit, and so on back to the original sender.

Memory buffer vs. disk buffer

By default, each destination worker has an in-memory queue (called fifo internally). The queue acts as a buffer between the source driver and the destination driver, absorbing transient backpressure from a slow destination. The source is only ACKed after the destination driver confirms successful delivery, which keeps the full end-to-end ACK chain intact. When the queue is full, flow-control decides the outcome: the source is slowed down, or the message is dropped.

disk-buffer() is the alternative. When a message lands in the disk buffer, AxoSyslog ACKs the source at that point, breaking the end-to-end ACK chain at the buffer. From the source's perspective the message is safely delivered, even though it has not yet reached the final destination. This is the "safe place" concept covered in the buffers and persistence section of the previous post: as long as the node is not permanently lost, the disk buffer eventually drains and the message reaches the destination.

disk-buffer() has two modes:

reliable(yes):
- The message is always written to disk.
- Messages survive ungraceful shutdowns.
- The source is ACKed at push time into the disk queue.
- An additional in-memory queue sits alongside the disk as a read cache, so workers can consume messages without reading and deserializing from disk every time.
reliable(no):
- Messages go into a memory queue first.
- When the memory queue fills up, subsequent messages go to disk.
- The source is ACKed at push time into the memory queue.
- On a crash, anything in the memory queue that has not been flushed to disk is lost, which can be a significant amount of data.

Batching and destination workers

Certain destination drivers split work between multiple workers. Each worker has its own queues and independently accumulates and sends batches of messages.

A worker keeps adding messages to the current batch until one of these conditions is met:

batch-lines(): the maximum number of messages in a batch. When reached, the worker sends the batch immediately.
batch-size(): the maximum byte size of messages in a batch. When reached, the worker sends the batch immediately.
batch-idle-timeout(): the maximum time to wait since the first message was added to the batch. This is useful for reducing latency from slow sources.

batch-idle-timeout() is a fourth option, covered separately below because it exists specifically to mitigate the interaction described in the next section.

The interaction nobody expects: `log-iw-size()` × `batch-lines()`

Here is the thing that nobody expects (well, that and the Spanish inquisition) and can cause pipelines silently lose throughput.

With a memory buffer and flow-control on, messages stay un-ACKed until the destination driver confirms delivery. That means every message currently sitting in a worker's queue, including the time it's waiting for a batch to fill up, counts against log-iw-size().

The total number of messages a destination can hold across all its workers is therefore bounded by log-iw-size(). If the window is too small for the batch configuration, workers cannot fill their batches: there are not enough messages in flight to go around. Each batch waits for batch-timeout(), flushes a partially empty batch, frees up window slots, and the cycle repeats.

A small log-iw-size() produces smaller batches and, more importantly, adds batch-timeout() latency to every cycle. The impact on throughput is dramatic.

Two sizing rules fall out of this:

The worst-case maximum batch size with multi-connection source drivers is log-iw-size() / max-connections() / workers().
To consistently reach batch-lines(), log-iw-size() must be at least max-connections() * batch-lines() * workers().

This interaction disappears with disk-buffer(). Because messages ACK when they enter the buffer, the source window only limits how fast the source feeds the disk buffer, not how many messages a worker can hold in a batch. Workers can fill batches up to batch-lines() regardless of log-iw-size().

`batch-idle-timeout()`: a mitigation, not a fix

batch-idle-timeout() exists to soften the latency penalty of an under-configured log-iw-size(). It counts from the last message added to the batch. When no new messages arrive within the timeout, the worker assumes the source has paused and flushes the partial batch early. This reduces latency without lowering batch-timeout(), which still serves as the hard upper bound.

The gap between two consecutive messages can come from a slow sender or from slow processing inside AxoSyslog itself. If a filterx, parser, or rewrite step is slow, the source may have plenty of messages to send but the queue fills slowly. The right value for batch-idle-timeout() sits just above the typical inter-message arrival time at the worker queue, accounting for both sender pace and internal processing.

Both extremes hurt:

Too low: batches flush prematurely even when messages are still coming, defeating the purpose.
Too high: you reintroduce the latency the option was meant to remove.

Practical tuning guidance

A short checklist sum up the previous sections:

Size log-iw-size() for your fan-out. With a memory buffer and flow-control on, you need at least max-connections() * batch-lines() * workers() to hit full batches.
Prefer dynamic-window-size() over a large static log-iw-size() when the actual number of connections is far below max-connections(). You get larger per-connection windows without overprovisioning the total.
Reach for disk-buffer() when you need durability across crashes, or when you want to decouple the ACK chain for better backpressure handling. Accept the disk I/O cost and the loosened end-to-end guarantee.
Use batch-idle-timeout() to defend against slow internal processing, not as a substitute for sizing log-iw-size() correctly.
Turn flow-control off only deliberately. It prevents backpressure at the cost of data loss, and that trade-off should be a conscious decision rather than a leftover default.

Takeaways

The end-to-end ACK chain is the spine of every other behavior in AxoSyslog. Flow control, batching, and buffering all depend on it.
log-iw-size() is not just a buffer size: it bounds in-flight messages across every worker, so it interacts directly with batching.
A pipeline can silently lose throughput when log-iw-size() is smaller than max-connections() * batch-lines() * workers(). The symptom is batch-timeout() latency on every batch, not an error in the logs.
disk-buffer() is the cleanest fix for the log-iw-size()` × `batch-lines() interaction because it moves the ACK point. It also gives you durability, at the cost of disk I/O.
batch-idle-timeout() mitigates the symptoms of an under-sized window but does not address the root cause. Size the window first; reach for the timeout second.

For the configuration syntax and full option reference, see the AxoSyslog documentation on flow-control and disk-buffer().