Sunday, December 7, 2014

UDP binding and port reuse in Linux

A recent technical challenge required me to dig deeply into how UDP ports are "bound" - that is, reserved or allocated - in the Linux TCP/IP implementation. It ended up being one of those cases where I had an intuition how things worked, then I found some evidence suggesting that my intuition was wrong, and in the end I discovered it was correct after all but in a different way than I'd expected. Along the way, I wrote and nearly published a different post that would have perpetuated some misperceptions about UDP port binding in Linux. Therefore, I am writing this post instead in an attempt to promulgate correct information.

If you're in a hurry to get to the punchline, it's this: you CAN bind more than one UDP socket to the same address and port in Linux; skip to the section titled Testing the solution to see how.

Background

In my last post, I mentioned working on a secondary controller, or "payload," for small autonomous aircraft. Our research team uses ROS to connect a variety of interrelated software components that run on this payload. The component I discussed before is the "bridge" between ROS and the aircraft autopilot; another component is the bridge between ROS and the wireless network that connects that aircraft to all other flying aircraft as well as to ground control stations.

We currently use 802.11n wireless devices in ad hoc mode, and optionally a mesh routing protocol (such as B.A.T.M.A.N. Advanced) that enables aircraft to act as relays, repeating messages on behalf of other aircraft that are out of range of direct transmission. Our command and status-reporting protocol is built on top of UDP, and we use either IP unicast or IP broadcast depending on the type of message being sent. Command messages from ground control stations to aircraft may be either unicast or broadcast; reports from each aircraft are always broadcast because other aircraft need to know its position for formation flight and collision avoidance.

To test software before we fly it, we run one or multiple Simulation-In-The-Loop (SITL) instances on one or more computers; each SITL instance includes the autopilot software, a simulated flight dynamics model, and the payload software. Because each instance of the payload software needs to communicate via UDP unicast and broadcast, both with other SITLs on the same computer and SITLs on other computers, we need a way to open multiple UDP sockets that can send and receive broadcasts to each other on the same port at the same time. Whether or not this is supported turns out to be a matter of great confusion.

The 60-second-or-so guide to UDP broadcasts (in Python)

As I noted in my previous post, most of the team develops in Python. Sending and receiving UDP broadcasts in Python is quite easy; to set up a socket and send a datagram to a broadcast IP address and arbitrary UDP port is all of four lines of code (not counting error-handling):


import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)
sock.sendto('Hello world', ('192.168.1.255', 1234))

These lines: 1) import the socket library, 2) create a UDP socket, 3) enable the socket to send to broadcast addresses (which is disallowed by default), and 4) send a datagram containing the string "Hello world" to the broadcast IP address 192.168.1.255 at UDP port 1234.

Note that whether 192.168.1.255 is considered a "broadcast address" depends on the local machine's network configuration. If, for instance, that machine has a network interface with IP address 192.168.1.123 and network mask of 255.255.255.0 then 192.168.1.255 is indeed a broadcast address - it is the highest-numbered address that falls within the "masked" network portion (192.168.1) of the address. However, if that network interface instead has a mask of 255.255.0.0 then 192.168.1.255 is not a broadcast address - 192.168.255.255 is though!

(Here are some more thorough explanations of network masks for the interested reader.)

To receive broadcast datagrams on a particular UDP port (e.g., 1234), we must "bind" our socket to 1234, informing the operating system's network stack that any received datagrams where the destination UDP port is 1234 should be provided to our socket. There is one more requirement: the binding requires an IP address as well; this must match the destination IP address of the received datagram. Because the destination broadcast address (e.g., 192.168.1.255) is different from our IP address (e.g., 192.168.1.123), we cannot use our own IP here. Instead we can use 0.0.0.0 (sometimes nicknamed IPADDR_ANY or simply the empty string ''), which is a wildcard address matching datagrams destined to any IP address on our computer, plus their corresponding broadcast addresses:

import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(('', 1234))
data, (ip, port) = sock.recvfrom(1024, socket.MSG_DONTWAIT)

(Okay, there are actually more subtleties to the "any" address, rules regarding what are sometimes called "Martian" packets, and so on. If you're fortunate enough to deal only with "normal" use cases, however, this explanation will generally suffice.)

Notice the double parentheses: for IPv4 (AF_INET) sockets, bind() takes a single tuple containing an address and a port. Receiving datagrams is done using recvfrom(), which returns a tuple of the form (data, (source_ip, source_port)). The function requires us to specify the maximum message payload we will accept (e.g., 1024) and any options we want. Here, I've specified the MSG_DONTWAIT option, which tells the system that if there are no datagrams immediately ready to be returned, then return None (Python's "null" value) in place of valid data. The default behavior is to block (wait) until a datagram arrives.

Multiple broadcasting sockets sharing a port

Per the use case I described above, we want to send broadcasts on the local machine, and have multiple other sockets (also on the same machine) receive those broadcasts.

Using the code above, we can verify that, on the same machine, one socket will receive another socket's broadcasts - that is, that broadcasts are "reflected" back on the local machine. In one terminal:

$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
>>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BROADCAST, 1)
>>> s.sendto('Hello world', ('192.168.1.255', 1234))
11
>>>

And in a second terminal:

$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
>>> s.bind(('', 1234))
>>> s.recvfrom(1024)
('Hello world', ('192.168.1.123', 43296))
>>> 

Here we see that nested tuple I mentioned above: the text "Hello world" followed by another tuple containing the datagram's source IP address (which in this case is our own address) and an ephemeral (automatically-assigned by the operating system) UDP port.

Now, let's try to set up another listening socket bound to the same port, in a third terminal. I would like to see it get its own copy of the received broadcast:

$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
>>> s.bind(('', 1234))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
socket.error: [Errno 98] Address already in use
>>> 

Whoa, no dice! As it turns out, most TCP/IP implementations by default disallow binding more than one socket to the same (address, port) pair at once. If we had a second interface with a different address, such as 192.168.2.234, we could simultaneously bind (192.168.1.123, 1234) and (192.168.2.234, 1234). This doesn't solve our problem, however, since we need to bind the wildcard address for each broadcast-receiving socket!

SO_REUSEADDR

This is where things start to get confusing. Looking at the Linux socket(7) man page, there is an option named SO_REUSEADDR (set the same way as SO_BROADCAST above) that would appear to solve this by allowing us to reuse the (address, port) pair across multiple bindings. Somewhere in the back of my mind, I recalled encountering such a solution in the past - just set a special option on the socket, and problem solved.

However, the man page offers the following verbiage about this option:
For AF_INET sockets this means that a socket may bind, except when there is an active listening socket bound to the address. When the listening socket is bound to INADDR_ANY with a specific port then it is not possible to bind to this port for any local address.
This is a double-whammy against us: not only can we not bind a second socket to the same address and port once the first socket is bound ("actively listening"), using the wildcard address in the first binding ties up all addresses for that port. (Because all broadcast-receiving sockets must use the wildcard address, this second issue is a bit of a moot point, but we still don't have to like it!)

Multiple forum posts reaffirm this information, stating that one cannot reuse an (address, port) pair already bound by another socket. After reading all this, I ended up going down some rather-interesting-in-their-own-right rabbit holes, one involving Linux network namespaces and another using a UDP proxy and binding each socket to a unique port, and came up with two viable but kludgy solutions to this problem. (The network namespaces solution is actually pretty cool and probably merits its own post.)

But then I came across what might be the most thorough StackOverflow answer ever written, which contradicted this information and ultimately rectified a fundamental misunderstanding about the SO_REUSEADDR option. Skipping to the part of the answer most relevant to UDP and Linux (emphasis added):
Prior to Linux 3.9, only the option SO_REUSEADDR existed. This option behaves generally the as in BSD with two important exceptions ... The second exception is that for UDP sockets this option behaves exactly like SO_REUSEPORT in BSD, so two UDP sockets can be bound to exactly the same address and port combination as long as both had this flag set before they were bound.
As it turns out, under Linux we can bind multiple UDP sockets to the same address and port! (And, as it turns out, my memory was of using the SO_REUSEPORT option on FreeBSD long, long ago.)

Yet both the man page and claims in several online forums disagree. Why? Because, in Linux specifically, SO_REUSEADDR has different effects on TCP and UDP sockets. The behavior for TCP, which is more strict, is what has been documented and hence is what is frequently cited in forums. Unfortunately, the documentation appears (at least as of any versions I have ready access to) never to have been updated with this important caveat.

We can examine the Linux kernel code to confirm this behavior (only after reading this answer did I think to do this). I happened to have Ubuntu-patched 3.2.0 kernel source handy, but the pertinent lines shouldn't have moved around too much in more recent versions (right? maybe.).

First, let's find where the SO_REUSEADDR option is handled:

$ grep -rn SO_REUSEADDR .
...
./net/core/sock.c:512:    case SO_REUSEADDR:
./net/core/sock.c:820:    case SO_REUSEADDR:
...


Two of the references are clearly part of a larger switch statement, so let's look in that file. The first instance, at line 512 of sock.c, is conveniently part of the setsockopt() call:

...
512     case SO_REUSEADDR:
513             sk->sk_reuse = valbool;
514             break;

...

So, the attribute sk_reuse of socket struct sk gets set by this case. We'd like to find out where it gets used in the context of UDP:

$ grep -rn sk_reuse .
...
./net/ipv4/udp.c:144:            (!sk2->sk_reuse || !sk->sk_reuse) &&
./net/ipv4/udp.c:176:            (!sk2->sk_reuse || !sk->sk_reuse) &&
...


These two references in udp.c appear in functions udp_lib_lport_inuse() and udp_lib_lport_inuse2(), respectively. Both functions take a socket struct sk (presumably the one we're trying to open) and check whether sk conflicts with an already open UDP socket. Both functions check a variety of conditions, including whether both sockets have the sk_reuse flag set. If they don't and certain other conditions are met (that both sockets are not bound only to specific interfaces, that the port numbers match, et cetera), then the functions will return true (that is, that the port does conflict with an in-use socket).

This validates our new information about SO_REUSEADDR and gives us hope that our binding problem may be easily solved.

Testing the solution

Adding one line of code to our simple Python UDP receiver, before the call to bind():

import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
sock.bind(('', 1234))
data, (ip, port) = sock.recvfrom(1024, socket.MSG_DONTWAIT)

We can rerun our earlier experiment and test whether more than one receiver can successfully bind to the same address and port, and whether all receivers get their own copy of the broadcast message (the following shows only the receivers, with the message being sent by the same sender code from above after both receivers finish binding and call recvfrom()):

Receiver #1:

$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
>>> s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
>>> s.bind(('', 1234))
>>> s.recvfrom(1024)
('Hello world', ('192.168.1.123', 35162))
>>>

Receiver #2:

$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import socket
>>> s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
>>> s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
>>> s.bind(('', 1234))
>>> s.recvfrom(1024)
('Hello world', ('192.168.1.123', 35162))
>>>

Success! All that was needed was to set SO_REUSEADDR before binding on each and every socket.

Notes and Caveats

As the StackOverflow answer author notes, the semantics of SO_REUSEADDR are different across platforms. This information appears to be accurate for all Linux 3.x kernels and most if not all 2.x kernels as well. The author also notes that as of 3.9.x, Linux supports SO_REUSEPORT, which has slightly different semantics (both for TCP and UDP). Check out the man pages for newer kernels to see what this option has to offer.

Regarding the context for this problem, one might be led to ask why I needed a "real" network interface to run multiple SITL instances on a single machine. It turns out that the loopback interface (the one typically with address 127.0.0.1, used for interprocess communications on the local machine) does not support broadcast. There are apparently very good technical reasons for this, but I did not delve deeply into those. It is, however, possible to create a virtual Ethernet interface and even a virtual bridge, and those do support broadcasts.

One might also wonder if it is possible to "cheat" the binding rules by using sub-interfaces - that is, assigning multiple IP addresses to a single interface. Depending on the implementation and tool you use, this will appear either as a single interface (e.g., eth0) having multiple addresses, or as there being multiple interfaces (e.g., eth0.0eth0.1, etc). Again, because we must bind to the wildcard address, these do not help us. Admittedly, there are some subtleties regarding interface-specific bindings that I did not choose to explore (check out SO_BINDTODEVICE), given that sub-interfaces are essentially implemented as assigning multiple addresses to the same interface, my intuition is that even this will not work.

Fortunately, in the end there was a working solution, and one that required minimal adjustments to my code. One further point on that: not only do other sockets on the same machine see each broadcast datagram, the sending socket (if also bound to the same port) does too! This means that if you are sending and receiving datagrams on the same socket and do not want to receive your own broadcasts, you will need to implement some extra logic to filter these out. Since other local sockets' broadcasts will have the same source address and port as your broadcasts, you may need to introduce a special identifier in your datagram payload to denote which are from you.

2 comments: