A common bottleneck for high-speed data transfers is the high rate of interrupts that the receiving system has to process – traditionally, a network adapter generates an interrupt for each frame that it receives. These interrupts consume signaling resources on the system’s bus(es), and introduce significant CPU overhead as the system transitions back and forth between “productive” work and interrupt handling many thousand times a second.
To alleviate this load, some high-speed network adapters support interrupt coalescence. When multiple frames are received in a short timeframe (“back-to-back”), these adapters buffer those frames locally and only interrupt the system once.
Interrupt coalescence together with large-receive offload can roughly be seen as doing on the “receive” side what transmit chaining and large-send offload (LSO) do for the “transmit” side.
Issues with interrupt coalescence
While this scheme lowers interrupt-related system load significantly, it can have adverse effects on timing, and make TCP traffic more bursty or “clumpy”. Therefore it would make sense to combine interrupt coalescence with on-board timestamping functionality. Unfortunately that doesn’t seem to be implemented in commodity hardware/driver combinations yet.
The way that interrupt coalescence works, a network adapter that has received a frame doesn’t send an interrupt to the system right away, but waits for a little while in case more packets arrive. This can have a negative impact on latency.
In general, interrupt coalescence is configured such that the additional delay is bounded. On some implementations, these delay bounds are specified in units of milliseconds, on other systems in units of microseconds. It requires some thought to find a good trade-off between latency and load reduction. One should be careful to set the coalescence threshold low enough that the additional latency doesn’t cause problems. Setting a low threshold will prevent interrupt coalescence from occurring when successive packets are spaced too far apart. But in that case, the interrupt rate will probably be low enough so that this is not a problem.
Configuration of interrupt coalescence is highly system dependent, although there are some parameters that are more or less common over implementations.
On Linux systems with additional driver support, the
ethtool -C command can be used to modify the interrupt coalescence settings of network devices on the fly.
Some Ethernet drivers in Linux have parameters to control Interrupt Coalescence (Interrupt Moderation, as it is called in Linux). For example, the
e1000 driver for the large family of Intel Gigabit Ethernet adapters has the following parameters according to the kernel documentation:
- limits the number of interrupts per second generated by the card. Values >= 100 are interpreted as the maximum number of interrupts per second. The default value used to be 8’000 up to and including kernel release 2.6.19. A value of zero (
0) disabled interrupt moderation completely. Above 2.6.19, some values between 1 and 99 can be used to select adaptive interrupt rate control. The first adaptive modes are “dynamic conservative” (1) and dynamic with reduced latency (3). In conservative mode (1), the rate changes between 4’000 interrupts per second when only bulk traffic (“normal-size packets”) is seen, and 20’000 when small packets are present that might benefit from lower latency. The more aggressive mode (3), “low-latency” traffic may drive the interrupt rate up to 70’000 per second. This mode is supposed to be useful for cluster communication in grid applications.
- specifies, in multiples of 1’024 microseconds, the time after reception of a frame to wait for another frame to arrive before sending an interrupt.
- bounds the delay between reception of a frame and generation of an interrupt. It is specified in units of 1’024 microseconds. Note that
RxAbsIntDelay, so even when a very short
RxAbsIntDelayis specified, the interrupt rate should never exceed the rate specified (either directly or by the dynamic algorithm) by
- specifies the number of descriptors to store incoming frames on the adapter. The default value is 256, which is also the maximum for some types of E1000-based adapters. Others can allocate up to 4’096 of these descriptors. The size of the receive buffer associated with each descriptor varies with the MTU configured on the adapter. It is always a power-of-two number of bytes. The number of descriptors available will also depend on the per-buffer size. When all buffers have been filled by incoming frames, an interrupt will have to be signaled in any case.
As an example, see the Platform Notes: Sun GigaSwift Ethernet Device Driver. It lists the following parameters for that particular type of adapter:
- Interrupt after this number of packets have arrived since the last packet was serviced. A value of zero indicates no packet blanking. (Range: 0 to 511, default=3)
- Interrupt after 4.5 microsecond ticks have elapsed since the last packet was serviced. A value of zero indicates no time blanking. (Range: 0 to 524287, default=1250)