Some time ago I was involved in analysis of a networking problem, the symptom of which were client-side timeouts (no answer received). The protocol was standardized, based on UDP.
Let's first analyze the flow of information from network adapter towards
the application. There is an excellent document provided by RedHat as
part of their OS documentation - it can be found
here. I recommend reading chapter 8 about networking. In general the flow looks as presented on the pic below:
Hardware/interrupt buffers
These are internal HW buffers inside the network interface card (NIC). If you see hw buffer overruns on your system the only thing that you will know is that the transfer rate is too high for this interface. The document previously mentioned gives indication how to monitor it using ethtool and what you could do to minimize the impact.
You can monitor this using tools getting stats from the NIC like ifconfig. Below you can find an example from my Ubuntu box (overruns indicate the number of times the HW buffer overrun happened):
$ ifconfig eth0
eth0 Link encap:Ethernet HWaddr b4:99:ba:e8:d3:6c
UP BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
Interrupt:20 Memory:d7500000-d7520000
UDP socket buffers
There is a lot explained what can be tuned in the document mentioned above. I will only try to explain how it can happen that the buffer queue will grow. In order to do that I will prepare a sample Java application (a modification of UDP echo client/server app). It consists of the server application and client one.
Test application
The client will be sending every 10 milliseconds an UDP datagram containing 64 bytes long string (16 times the word "test").
The server on the other hand will be one-thread application reading from socket. The reading thread after retrieving information from the socket will be sleeping for 1 second - 1s is set to show the effect in shorter timeframe but please notice any value above 10 ms (sleep on the client side) will cause this behaviour. In general such problem might happen in real life scenario if the same thread that is reading from socket is doing some processing.
Code for udp echo client:
package org.krystianek.udp.test.echo;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.DatagramPacket;
import java.net.DatagramSocket;
import java.net.InetAddress;
import java.net.SocketException;
import java.util.logging.Logger;
public class EchoClient {
public static void main(String[] args) throws Exception {
String hostname = "localhost";
InetAddress ia = InetAddress.getByName(hostname);
SenderThread sender = new SenderThread(ia, 9999);
sender.start();
}
}
class SenderThread extends Thread {
private InetAddress server;
private DatagramSocket socket;
private boolean stopped = false;
private int port;
public SenderThread(InetAddress address, int port) throws SocketException {
this.server = address;
this.port = port;
this.socket = new DatagramSocket();
this.socket.connect(server, port);
}
public void halt() {
this.stopped = true;
}
public DatagramSocket getSocket() {
return this.socket;
}
public void run() {
try {
String theLine ="testtesttesttesttesttesttesttesttesttesttesttesttesttesttesttest";
while (true) {
byte[] data = theLine.getBytes();
DatagramPacket output = new DatagramPacket(data, data.length, server, port);
System.out.println("Sending : " + output.getData().length + " bytes ");
socket.send(output);
Thread.yield();
Thread.sleep(100);
}
}
catch (Exception ex) {
System.err.println(ex);
}
}
}
Code for udp echo server:
package org.krystianek.udp.test.echo;
import java.net.DatagramSocket;
import java.net.InetSocketAddress;
import java.net.SocketAddress;
import java.nio.ByteBuffer;
import java.nio.channels.DatagramChannel;
/**
* @author krychu
*
*/
public class EchoServer {
public static void main(String[] args) throws Exception {
DatagramChannel channel = DatagramChannel.open();
DatagramSocket socket = channel.socket();
SocketAddress address = new InetSocketAddress(9999);
socket.bind(address);
ByteBuffer buffer = ByteBuffer.allocateDirect(65507);
while (true) {
SocketAddress client = channel.receive(buffer);
try {
Thread.sleep(1000);
} catch (Exception ex) {
ex.printStackTrace();
}
buffer.flip();
// System.out.println("Server received : " + buffer);
// channel.send(buffer, client);
// buffer.clear();
}
}
}
For the test the classes have been packaged into echo-0.0.1-SNAPSHOT.jar using maven build system.
Monitoring the system
The next step was to identify how to monitor the buffer utilization. There are different ways - in my article I will use the statistics available in the proc filesystem - quite good description can be found in the man page:
$ man proc
...
/proc/net/udp
Holds a dump of the UDP socket table. Much of the information is not of use apart from debugging. The "sl" value is the kernel hash slot for the socket, the "local_address" is the local address and port number pair. The "rem_address" is the remote address and port number pair (if connected). "St" is the internal status of the socket. The "tx_queue" and "rx_queue" are the outgoing and incoming data queue in terms of kernel memory usage. The "tr", "tm->when", and "rexmits" fields are not used by UDP. The "uid" field holds the effective UID of the creator of the socket. The format is:
sl local_address rem_address st tx_queue rx_queue tr rexmits tm->when uid
1: 01642C89:0201 0C642C89:03FF 01 00000000:00000001 01:000071BA 00000000 0
1: 00000000:0801 00000000:0000 0A 00000000:00000000 00:00000000 6F000100 0
1: 00000000:0201 00000000:0000 0A 00000000:00000000 00:00000000 00000000 0
...
There are two files available one for IPv4 (/proc/net/udp) and IPv6 (/proc/net/udp6) - the example output can be found below
$ cat /proc/net/udp
sl local_address rem_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops
564: 00000000:0801 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 13276 2 0000000000000000 0
601: 00000000:E026 00000000:0000 07 00000000:00000000 00:00000000 00000000 118 0 12564 2 0000000000000000 0
...
1833: 00000000:9CF6 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 13808 2 0000000000000000 0
1884: 00000000:ED29 00000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 1145241 2 0000000000000000 0
$
$ cat /proc/net/udp6
sl local_address remote_address st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode ref pointer drops
2: 00000000000000000000000000000000:ADCF 00000000000000000000000000000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 13287 2 0000000000000000 0
37: 00000000000000000000000000000000:9DF2 00000000000000000000000000000000:0000 07 00000000:00000000 00:00000000 00000000 104 0 10786 2 0000000000000000 0
...
1820: 00000000000000000000000000000000:14E9 00000000000000000000000000000000:0000 07 00000000:00000000 00:00000000 00000000 104 0 10784 2 0000000000000000 0 2024: 00000000000000000000000000000000:B5B5 00000000000000000000000000000000:0000 07 00000000:00000000 00:00000000 00000000 0 0 13796 2 0000000000000000 0
In addition I developed a simple ruby script for parsing the udp stats.
The full parameter set for the utility:
$ udpstat
Usage: udpstat [options]
Specific options:
-6, --ipv6 Enable IPv6 stat parsing
-K, --kilobytes Present values in kilobytes
-M, --megabytes Present values in megabytes
-p, --ppid pid PID of the process to be monitored
Common options:
-h, --help Show this message
-v, --version Show version
- interval in which statistics are to be reported
- the number of statistics to be shown
@2013 Krystian Brachmanski, Wroclaw
The example dump can be found below:
$ ./measurements/perftools/udpstat -6 1 2
Timestamp Local IP:port Remote IP:port rx[B] rbuf[B] tx[B] tbuf[B] drops
1353956875 00000000000000000000000000000000:44495 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956875 00000000000000000000000000000000:40434 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956875 00000000000000000000000000000000:24105 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956875 00000000000000000000000000000000:36499 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956875 00000000000000000000000000000000:51186 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956875 00000000000000000000000000000000:2049 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956875 00000000000000000000000000000000:111 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956875 00000000000000000000000001000000:123 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956875 000080FE00000000FFD724029C55BDFE:123 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956875 00000000000000000000000000000000:123 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956875 00000000000000000000000000000000:45593 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956875 00000000000000000000000000000000:612 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956875 00000000000000000000000000000000:5353 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956875 00000000000000000000000000000000:46517 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956876 00000000000000000000000000000000:44495 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956876 00000000000000000000000000000000:40434 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956876 00000000000000000000000000000000:24105 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956876 00000000000000000000000000000000:36499 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956876 00000000000000000000000000000000:51186 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956876 00000000000000000000000000000000:2049 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956876 00000000000000000000000000000000:111 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956876 00000000000000000000000001000000:123 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956876 000080FE00000000FFD724029C55BDFE:123 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956876 00000000000000000000000000000000:123 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956876 00000000000000000000000000000000:45593 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956876 00000000000000000000000000000000:612 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956876 00000000000000000000000000000000:5353 00000000000000000000000000000000:0 0 1048576 0 1048576 0
1353956876 00000000000000000000000000000000:46517 00000000000000000000000000000000:0 0 1048576 0 1048576 0
You can find the utility in google code as part of the yapcg tool:
http://code.google.com/p/yapcg/source/browse/measurements/perftools/udpstat
Testing the UDP buffers
Configuration of the test application (actually it is in the code):
- udp port: 9999
- IPv6 socket listening
- system settings:
net.core.rmem_default = 1048576
net.core.rmem_max = 1048576
1. Starting the Echo UDP server in the background:
$ java -cp ./target/echo-0.0.1-SNAPSHOT.jar org.krystianek.udp.test.echo.EchoServer &
2. Start the udp6 monitor the ipv6 udp sockets in the background and present the statistics in kilobytes:
$ ./udpstat -6 -KB 1 100000 > udpstat.out 2>&1 &
...
3. Starting the client:
$ java -cp ./target/echo-0.0.1-SNAPSHOT.jar org.krystianek.udp.test.echo.EchoClient
4. Analysis of the results:
- if we take a look at the generated output:
$ cat udpstat.out
Timestamp Local IP:port Remote IP:port rx[KB] rbuf[KB] tx[KB] tbuf[KB] drops
...
1358196897 00000000000000000000000000000000:9999 00000000000000000000000000000000:0 0 1024 0 1024 0
...
1358196898 00000000000000000000000000000000:9999 00000000000000000000000000000000:0 8 1024 0 1024 0
...
1358196902 00000000000000000000000000000000:9999 00000000000000000000000000000000:0 38 1024 0 1024 0
...
1358196906 00000000000000000000000000000000:9999 00000000000000000000000000000000:0 68 1024 0 1024 0
...
1358196909 00000000000000000000000000000000:9999 00000000000000000000000000000000:0 90 1024 0 1024 0
...
1358196910 00000000000000000000000000000000:9999 00000000000000000000000000000000:0 97 1024 0 1024 0
...
1358196917 00000000000000000000000000000000:9999 00000000000000000000000000000000:0 149 1024 0 1024 0
...
1358196928 00000000000000000000000000000000:9999 00000000000000000000000000000000:0 230 1024 0 1024 0
...
1358197036 00000000000000000000000000000000:9999 00000000000000000000000000000000:0 1024 1024 0 1024 9 1358197037 00000000000000000000000000000000:9999 00000000000000000000000000000000:0 1024 1024 0 1024 8 1358197038 00000000000000000000000000000000:9999 00000000000000000000000000000000:0 1024 1024 0 1024 7 1358197039 00000000000000000000000000000000:9999 00000000000000000000000000000000:0 1024 1024 0 1024 7
...
As expected the receive socket buffer starts to increase as soon as the UDP echo client starts since the transmission rate of the client is much higher than the reception rate of the server (Server receiver once per second, while client transmitting 100 times 64 bytes per second). As soon as the receive socket buffer utilization level reaches the maximum defined in system kernel parameters one can see packets drops happening (marked with red colour).
Everything seems to be working as expected...
Conclusions
When designing an application that is processing high volumes of data always ensure that the thread reading from the socket is not doing any additional, unnecessary processing since it may cause your socket buffers utilization go high and eventually would cause packet drop. If the additional processing/decoding is necessary then still an option with a thread pool might solve the buffer issue.
Anyway in my opinion the general rule of thumb should be that it is always beneficial to make the socket reading layer as lightweight as possible. With a single threaded approach one could think of creating a queue to which the reader thread would put the payload and a pool of worker threads would read from the queue and continue with the processing.