libnl icon indicating copy to clipboard operation
libnl copied to clipboard

lib: adjust small time values in _badrandom_from_time

Open cferris1000 opened this issue 1 year ago • 1 comments

On some systems, the clock is reset, or is lost, so the value returned by time can be a very small value. In that case, the _badrandom_from_time function returns a large value close to the maximum uint32_t value. This can be a problem when used to create a sequence number and that number overflows the uint32_t maximum value when it is incremented.

In this case, detect when the time value is too small, and add a value to make sure the value returned is not too close to the uint32_t maximum value.

cferris1000 avatar Jul 10 '24 22:07 cferris1000

This can be a problem when used to create a sequence number and that number overflows the uint32_t maximum value when it is incremented.

What is exactyly the problem with the wrap around of the sequence number?

_badrandom_from_time() can give any number. If some numbers are unsuitable, then the caller needs to handle that (e.g. by handling (avoiding?) wrap around).

thom311 avatar Jul 12 '24 07:07 thom311

This happens because the s_seq_next value will be some really large value, and then in nl_socket_use_seq() the sk->s_seq_next++ will overflow on these devices that don't have a clock set to the current time and just start time at 0. In our code base, we compile the code such that it will abort on overflow, so these devices abort quickly.

I did have another version that avoids the overflow by modifying nl_socket_use_seq to check if the sequence number is at the maximum uint32_t value then set it to zero rather than overflow. The downside to the sequence overflow check is that you also need to modify lib/nl.c, the nl_complete_msg() function to call nl_socket_use_seq (or duplicate the logic there). I am probably overthinking this, but I was worried that it would be adding extra instructions or extra function calls to hot paths. I was also not sure if the code can handle the sequence number getting set back to zero.

Should I abandon this patch and do the overflow check patch I mentioned above? Is there another approach that would be better?

Thanks.

cferris1000 avatar Jul 12 '24 21:07 cferris1000

In our code base, we compile the code such that it will abort on overflow, so these devices abort quickly.

That seems a questionable thing to do. It is understood that wrap around of unsigned integers is well defined. libnl relies on that. You can build your own code base with whatever flags work for you, but you are making assumptions (that wrap around of unsigned int would be always a bug), that libnl does not share.

Btw, your patch doesn't avoid the wrap around either. It just makes it only happen after a large number of messages, it still can happen. That's at least ugly, even if you don't ever expect to send 3 billion messages before the wrap around happens.

I was also not sure if the code can handle the sequence number getting set back to zero.

I think zero should be fine. Regardless, if zero were to be avoided, then the logic for that should be at s_seq_next and not in _badrandom_from_time().

I did have another version that avoids the overflow by modifying nl_socket_use_seq to check if the sequence number is at the maximum uint32_t value then set it to zero rather than overflow.

That seems better. I would take such a patch. Or maybe re-evaluate, whether rejecting wrap around of unsigned integers should really be treated as a bug.

The downside to the sequence overflow check is that you also need to modify lib/nl.c, the nl_complete_msg() function to call nl_socket_use_seq (or duplicate the logic there).

As you say, that is no problem. Don't let lib/nl.c directly access s_seq_next but let it call nl_socket_use_seq().

thom311 avatar Jul 15 '24 06:07 thom311

Create a pull request of the sequence overflow protection here:

https://github.com/thom311/libnl/pull/395

cferris1000 avatar Jul 17 '24 02:07 cferris1000

closing for 395

thom311 avatar Jul 17 '24 06:07 thom311