onload icon indicating copy to clipboard operation
onload copied to clipboard

RTO timer being removed

Open romandivackyabsa opened this issue 3 years ago • 5 comments

Hi,

I am debugging an issue with packets not being retransmitted. I can see an RTO timer created, in ci_tcp_ds_done(), yet when ci_ip_timer_poll() runs a few miliseconds later it's not there anymore. I instrumented all the ci_tcp_rto_* manipulating methods and they are not called. I also instrumented the actual RTO callback, not called either.

Can someone please tell me what removes the RTO timer? Thanks

romandivackyabsa avatar Feb 08 '22 10:02 romandivackyabsa

2 obvious way to remove the RTO timer are:

  • incoming ACK;
  • TCP connection status change (shutdown, reset, etc).

I instrumented all the ci_tcp_rto_* manipulating methods

Just to be sure: do you understand that such functions exist in both libonload.so library and in the onload.ko kernel module? If "instrumenting" is ci_log(), then you should recompile both (and reload the module). If you use other ways, then again, you should remember this and apply instrumentation twice.

ol-alexandra avatar Feb 08 '22 10:02 ol-alexandra

I am debugging just the userspace libonload.so. I can see the RTO timer there (dumped by ci_ip_timer_state_dump()), yet next call (on the same netif) from ci_ip_timer_poll() doesnt list the RTO timer. Not sure why/how kernel module could affect those.

Regarding ACKs. Am I wrong in assuming that an ACK removes the RTO timers only in ci_tcp_rx_free_acked_bufs() ? That is not called in my case. Or is there some other place that I missed?

romandivackyabsa avatar Feb 08 '22 10:02 romandivackyabsa

Let me repeat - all these functions can be called from onload.ko. It is completely useless to "debug just the userspace". For example:

  • I can see the RTO timer there (dumped by ci_ip_timer_state_dump())
  • The timer is handled from onload.ko
  • Next call (on the same netif) from ci_ip_timer_poll() doesnt list the RTO timer.

Am I wrong in assuming that an ACK removes the RTO timers only in ci_tcp_rx_free_acked_bufs() ? That is not called in my case.

How can you say that when you have no idea of what's going on in the kernel module?

ol-alexandra avatar Feb 08 '22 10:02 ol-alexandra

So what you're implying is that both kernel and userspace lib both access the same data structures for the same tcp connection? That would explain why I am not seeing the ACK.

Would you give me a few pointers on where to look how this usespace/kernel cooperation is implemented?

romandivackyabsa avatar Feb 08 '22 11:02 romandivackyabsa

So what you're implying is that both kernel and userspace lib both access the same data structures for the same tcp connection?

Yes.

Would you give me a few pointers on where to look how this usespace/kernel cooperation is implemented?

"netif state", ni->state is the shared memory. It is mapped to both kernel and userland.

ol-alexandra avatar Feb 08 '22 11:02 ol-alexandra