Constantine Peresypkin

Results 252 comments of Constantine Peresypkin

@milantracy do you need IP_MTU_DISCOVER flag to not fail, or you do actually need tracepath to find the path? The latter is kinda a security issue. Fixing the flag doesn't...

I will add some tests soon.

@dobrac I have improved your tests to iterate until pending ops queue is reproduced. Now it quite reliably repros in under 10 iterations for me locally.

Codecov idea of "coverage" seems incorrect here. Flagging a debug print is not the best use of coverage checks. So, ignored.

@bchalios @kalyazin I'm not sure how to kick-off the codecov pass again, other than that this one should be ready.

@bchalios Here's the thread dump: ``` firecracker (pid=155): [] ep_poll+0x46a/0x4a0 [] do_epoll_wait+0x58/0xd0 [] do_compat_epoll_pwait.part.0+0x12/0x90 [] __x64_sys_epoll_pwait+0x8c/0x150 [] x64_sys_call+0x1814/0x2330 [] do_syscall_64+0x81/0xc90 [] entry_SYSCALL_64_after_hwframe+0x76/0x7e fc_api (pid=156): [] ep_poll+0x46a/0x4a0 [] do_epoll_wait+0x58/0xd0 [] do_compat_epoll_pwait.part.0+0x12/0x90...

@bchalios I think your explanation is on-point: > The interrupt we send was not actually being sent in the guest in the case of snapshot resume, because part of the...

@bchalios I think your fix works well! The repro frequency is lower but still can be reproed: ``` ================================================================================ Attempt 13/30 - Testing for non-zero async I/O drain ================================================================================ 2025-12-16T00:58:24.910953439...

We also experimented with scale to zero and back. But it may lead to version discrepancies between various worker groups. So it's safer to just kill everything in our case.

I don't care much about head pod. I manage all actors from the application anyway. So if it restarts it's ok. Obviously it would be nice if it would re-create...