In version 4.3.0 of NSD, additional functionality was added to increase performance even more. Most notably, this includes processor affinity.
NSD is performant by design because it matters when operators serve hundreds of
thousands or even millions of queries per second. We strive to make the right
choices by default, like enabling the use of
libevent at the configure stage
to ensure the most efficient event mechanism is used on a given platform. e.g.
epoll on Linux and
kqueue on FreeBSD. Switches are available for
operators who know the implementation on their system behaves correctly, like
enabling the use of
recvmmsg at the configure stage
--enable-recvmmsg) to read multiple messages from a socket in one
By default NSD forks (only) one server. Modern computer systems however, may
have more than one processor, and usually have more than one core per processor.
The easiest way to scale up performance is to simply fork more servers by
configuring server-count: to match the number of cores available in the system
so that more queries can be answered simultaneously. If the operating system
supports it, ensure
reuseport: is set to
yes to distribute incoming
packets evenly across server processes to balance the load.
A couple of other options that the operator may want to consider:
Memory usage can be lowered (around 50%) by using zone files and disable the on-disk database by setting
TCP capacity can be significantly increased by setting
tcp-timeout: 3. Set
tcp-reject-overflow: yesto prevent the kernel connection queue from growing.
The aforementioned settings provide an easy way to increase performance without
the need for in-depth knowledge of the hardware. For operators that require even
cpu-affinity is available.
The operating system’s scheduling-algorithm determines which core a given task
is allocated to. Processors build up state — e.g. by keeping frequently accessed
data in cache memory — for the task that it is currently executing. Whenever a
task switches cores, performance is degraded because the core it switched to has
yet to build up said state. While this scheduling-algorithm works just fine for
general-purpose computing, operators may want to designate a set of cores for
best performance. The
cpu-affinity family of configuration options was added
to NSD specifically for that purpose.
Processor affinity is currently supported on Linux and FreeBSD. Other operating
systems may be supported in the future, but not all operating systems that can
run NSD support CPU pinning. To fully benefit from this feature, one must first
determine which cores should be allocated to NSD. This requires some knowledge
of the underlying hardware, but generally speaking every process should run on a
dedicated core and the use of Hyper-Threading cores should be avoided to prevent
resource contention. List every core designated to NSD in
bind each server process to a specific core using
xfrd-cpu-affinity to improve L1/L2 cache hit rates and reduce pipeline
cpu-affinity: 0 1 2
ip-address: options in the
server: clause can be configured per server
or set of servers. Sockets configured for a specific server are closed by other
servers on startup. This improves performance if a large number of sockets are
select/poll and avoids waking up multiple servers when a
packet comes in, known as the thundering herd problem. Though both problems
are solved using a modern kernel and a modern I/O event mechanism, there is one
other reason to partition sockets, explained below.
ip-address: 192.0.2.1 servers=1
Bind to Device¶
ip-address: options in the server: clause can now also be configured to bind
directly to the network interface device on Linux (
bindtodevice=yes) and to
use a specific routing table on FreeBSD (
setfib=<N>). These were added to
ensure UDP responses go out over the same interface the query came in on if
there are multiple interfaces configured on the same subnet, but there may be
some performance benefits as well as the kernel does not have to go through the
network interface selection process.
ip-address: 192.0.2.1 bindtodevice=yes setfib=<N>
FreeBSD does not create extra routing tables on demand. Consult the FreeBSD Handbook, forums, etc. for information on how to configure multiple routing tables.
Field tests have shown best performance is achieved by combining the aforementioned options so that each network interface is essentially bound to a specific core. To do so, use one IP address per server process, pin that process to a designated core and bind directly to the network interface device.
cpu-affinity: 0 1 2
ip-address: 192.0.2.1 servers=1 bindtodevice=yes setfib=1
ip-address: 192.0.2.2 servers=2 bindtodevice=yes setfib=2
The above snippet serves as an example on how to use the configuration options. Which cores, IP addresses and routing tables are best used depends entirely on the hardware and network layout. Be sure to test extensively before using the options.