exanic-software icon indicating copy to clipboard operation
exanic-software copied to clipboard

Bug fix for exasock timeout logic

Open jnmcmanus opened this issue 1 year ago • 0 comments

There's a bug in libs/exasock/socket/common.h that corrupts the timeout parameters set on exasock sockets. The following line occurs twice in this file:

const struct timespec *to = (const struct timespec *)&to_val

This is a static cast from a timeval pointer to a timespec pointer.

Here's how timeval is defined in the GNU C library:

struct timeval {
    time_t   tv_sec;   /* Seconds */
    long int tv_usec;  /* Microseconds */
};

And here's the definition of timespec:

struct timespec {
    time_t   tv_sec;   /* Seconds */
    long int tv_nsec;  /* Nanoseconds */
};

The two problems with this are:

  1. Technically, the types of tv_usec and tv_nsec are implementation defined. In the GNU C library, they're the same (long int), but these could be different types altogether in another implementation.
  2. More gravely, one struct represents the time as a (seconds, microseconds) pair, while the other uses (seconds, nanoseconds)! Obviously, you can't simply cast a pointer from one struct to the other and be done with it. But that's what exasock does, today. Socket options like the send timeout (SO_SENDTIMEO) are specified with a timeval instance. Then, to compute the timeout time, exasock "converts" this to a timespec, so a timeout given in microseconds is interpreted as a time in nanoseconds.

The consequence of this bug is that timeouts set to any value less than a second are essentially ignored. For instance, a timeout of 900,000 microseconds (900 milliseconds) is treated like a timeout of 900,000 nanoseconds (900 microseconds). Since the resolution of the CLOCK_MONOTONIC_COARSE system timer is on the order of milliseconds on Linux (e.g., see https://upvoid.com/devblog/2014/05/linux-timers/), a timeout parameter < 1 millisecond has little meaning/predictability.

This pull request suggests a minimal changeset to fix the bug. It has been tested on a Linux system and confirmed correct.

Resolves Issue #84

jnmcmanus avatar Jan 29 '24 16:01 jnmcmanus