odin-http icon indicating copy to clipboard operation
odin-http copied to clipboard

nbio: linux: flush_completions can hang forever

Open HeavyHorst opened this issue 9 months ago • 0 comments

flush_completions :: proc(io: ^IO, wait_nr: u32, timeouts: ^uint, etime: ^bool) -> os.Errno {
	cqes: [256]io_uring.io_uring_cqe
	wait_remaining := wait_nr
	for {
		completed, err := io_uring.copy_cqes(&io.ring, cqes[:], wait_remaining)
		if err != .None do return ring_err_to_os_err(err)

		if wait_remaining < completed {
			wait_remaining = 0
		} else {
			wait_remaining = max(0, wait_remaining - completed)
		}

		if completed > 0 {
			queue.reserve(&io.completed, int(completed))
			for cqe in cqes[:completed] {
				io.ios_in_kernel -= 1

				if cqe.user_data == 0 {
					timeouts^ -= 1

					if (-cqe.res == i32(os.ETIME)) {
						etime^ = true
					}
					continue
				}

				completion := cast(^Completion)uintptr(cqe.user_data)
				completion.result = cqe.res

				queue.push_back(&io.completed, completion)
			}
		}

		if completed < len(cqes) do break
	}

	return os.ERROR_NONE
}

If "wait_remaining" is less then "completed", this line:

wait_remaining = max(0, wait_remaining - completed)

underflows and wait_remaining gets really large (4294967040 in my case). The next call of io_uring.copy_cqes then never finishes.

If i just change the code to

		if wait_remaining < completed {
			wait_remaining = 0
		} else {
			wait_remaining = max(0, wait_remaining - completed)
		}

the problem does not occur anymore.

HeavyHorst avatar Mar 08 '25 19:03 HeavyHorst