HomaModule
HomaModule copied to clipboard
Use ECONNRESET for homa_copy_to_user if recvmsg called with id 0
Since the recvmsg is indeed valid under this case, returning EINVAL can be confusing, I think it would be better to use another error number to indicate this situation, I chose ECONNRESET for this case, maybe you have a better idea?
I'm taking a look at this now... sorry it took so long. Can you help me understand the situation better: under what conditions is your new code executed? I'm wondering if it would be better to change the error code at the place where the error is first detected, rather than here.
Sorry for not giving broader context, I noticed this error when transmitting relatively large messages (e.g. 8k) concurrently on one socket (e.g. 100 rpcs). Client exists and sent abort when server-side still returning from homa_copy_to_user. At the end of homa_copy_to_user,
if (rpc->state == RPC_DEAD)
error = -EINVAL;
this two lines found the rpc is already aborted and returned EINVAL.
Thanks for the additional information. So the client is closing its socket while RPCs are still in progress? Do you know how the server finds out about this and decides to call homa_rpc_free? Is it receiving an UNKNOWN packet from the client? My memories of how this works have faded, so before I dig in deeper I thought I'd see if you have already figured this out. What I'd like to do is determine who on the server is deciding to kill the RPC (at that point there is the best information about why it's being killed) and then set the error code in the RPC to ECONNRESET there. Then homa_copy_to_user can return that code.
-John-
On Thu, Sep 26, 2024 at 8:25 AM breakertt @.***> wrote:
Sorry for not giving broader context, I noticed this error when transmitting relatively large messages (e.g. 8k) concurrently on one socket (e.g. 100 rpcs). Client exists and sent abort when server-side still returning from homa_copy_to_user. At the end of homa_copy_to_user,
if (rpc->state == RPC_DEAD) error = -EINVAL;this two lines found the rpc is already aborted and returned EINVAL.
— Reply to this email directly, view it on GitHub https://github.com/PlatformLab/HomaModule/pull/63#issuecomment-2377285734, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOOUCVUDZFL4R5MQMGPEOLZYQRPDAVCNFSM6AAAAABN7X2NX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZXGI4DKNZTGQ . You are receiving this because you commented.Message ID: @.***>
So the client is closing its socket while RPCs are still in progress?
yes
Do you know how the server finds out about this and decides to call homa_rpc_free?
AFAIK client send abort to server and server call homa_rpc_free set the rpc to dead at softirq (homa_rpc_abort),
There is no "send abort" in Homa, so I think the server must be finding out via an UNKNOWN packet. I'll try implementing a fix based on that assumption, then you can tell me whether you're getting ECONNRESET as expected.
-John-
On Fri, Sep 27, 2024 at 6:34 AM breakertt @.***> wrote:
So the client is closing its socket while RPCs are still in progress?
yes
Do you know how the server finds out about this and decides to call homa_rpc_free?
AFAIK client send abort to server and server set the rpc to dead at softirq, homa_rpc_free should be done while reap I guess?
— Reply to this email directly, view it on GitHub https://github.com/PlatformLab/HomaModule/pull/63#issuecomment-2379300148, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOOUCVSL6MO2AMA5KFTZ3LZYVNGHAVCNFSM6AAAAABN7X2NX2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZZGMYDAMJUHA . You are receiving this because you commented.Message ID: @.***>
After looking at this some I've found a cleaner solution. The dead RPC should not be returned from recvmsg in the first place; this is a bug. The code in homa_wait_for_message that checks for this is in the wrong place. Once I fix that, the dead RPC will be skipped, so there will not be a need to return an error from recvmsg; it will just go on to the next RPC.
Indeed just look for next RPC would be most ideal for userspace app.