infra
infra copied to clipboard
Spawning sandboxes gets stuck on loading snapshot
Sometimes when spawning a sandbox the process gets stuck on loading the snapshot and times out.
One of the possible causes of this is the network namespace handling in Go — it is possible that the goroutine namespace could be somehow switched because of the way the threads and namespaces are handled.
This bug would cause the request for sandbox to return 500 "Cannot create a environment instance right now" error.
I think this is caused by the UFFD handler that sometimes panics with:
uffd_msg not ready
stack backtrace:
0: rust_begin_unwind
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:595:5
1: core::panicking::panic_fmt
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:67:14
2: core::panicking::panic_display
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:150:5
3: core::panicking::panic_str
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:134:5
4: core::option::expect_failed
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/option.rs:1988:5
5: uffd_valid_handler::uffd_utils::Runtime::run
6: uffd_valid_handler::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread 'main' panicked at src/firecracker/examples/uffd/valid_handler.rs:32:14:
This issue is no longer relevant as we moved the UFFD handler into the orchestrator.