wasmtime
wasmtime copied to clipboard
Unable to load modules on aarch64 linux machine via a rust based host app
Thanks for filing a bug report! Please fill out the TODOs below.
Steps to Reproduce
- The app uses the following to load the module
let mut config = wasmtime::Config::new();
let engine = wasmtime::Engine::new(&config)?;
let module = wasmtime::Module::from_file(&engine, "./test.wasm")?;
Expected Results
The module should load
Actual Results
Load fails with
thread 'main' panicked at 'called Result::unwrap()
on an Err
value: Os { code: 22, kind: InvalidInput, message: "Invalid argument" }', /home/rbackhouse/.cargo/registry/src/github.com-1ecc6299db9ec823/wasmtime-jit-1.0.0/src/code_memory.rs:63:14
Versions and Environment
Wasmtime version or commit: 1.0.0
Operating system: Ubuntu 18.04
Architecture: aarch64
Extra Info
The CPU being used is a Quad-core ARM Cortex-A57 MPCore processor.
Looking at code_memory.rs line 60 it seems the call
rustix::process::membarrier(
rustix::process::MembarrierCommand::RegisterPrivateExpeditedSyncCore,
)
.unwrap();
is failing
If I comment out this call and the one at line 168 then everything works fine.
cc @akirilov-arm
Which kernel version are you using? And which glibc version?
Kernel version is 4.9.253-tegra glibc version is (Ubuntu GLIBC 2.27-3ubuntu1.4) 2.27
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE has been introduced in linux 4.16. Maybe wasmtime will have to fall back to MEMBARRIER_CMD_SHARED when MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE is not available?
I am not sure if MEMBARRIER_CMD_SHARED
would provide the same guarantee as MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE
, while it definitely seems more heavyweight, since it affects all threads in the system (not just the ones in the current process). IMHO the first decision to make is how old of a kernel version should Wasmtime support.
It looks like Linux 4.9.253 was released on Jan 23, 2021. It is a maintenance release of an older series (4.9), but 5.0 was released in Mar 2019 and is not that old. IMHO we should keep support for kernels older than that if we can (though I'm not sure where to draw the line). Especially for aarch64, kernel forks for SoCs are sometimes a bit more out of date.
(Err, I noted 5.0 above but (i) 4.9 was followed by 4.10, not 5.0 and (ii) the feature in question was in 4.16 as @bjorn3 notes; so the relevant date is Apr 1, 2018 which, also, is not that old...)
Can we cascade into the fallback command (MEMBARRIER_CMD_SHARED
) if the current, more up-to-date choice gives us EINVAL
?
OK, I got confirmation that MEMBARRIER_CMD_SHARED
was an acceptable substitute, keeping in mind the following:
- It is a much more expensive operation because it is affected by all threads in all processes in the system, instead of only the threads in the current process
- It could still fail if the
nohz_full
parameter is passed to the kernel
It looks like rustix
doesn't currently have a binding for MEMBARRIER_CMD_SHARED
here; once it does, I'm happy to do a PR to add the fallback behavior here. @sunfishcode , would you be willing to update rustix to add that option?
MEMBARRIER_CMD_SHARED
is an alias of MEMBARRIER_CMD_GLOBAL
for the sake of backward compatibility.
Ah! In that case, let's see if #4987 works (@rbackhouse would you be able to test this patch?).
@cfallin yes I can try it
@cfallin crates/jit/src/code_memory.rs line 168 also has a call to rustix::process::membarrier.
Should this have the same
.or_else(|_| rustix::process::membarrier(rustix::process::MembarrierCommand::Global))
added too ?
@cfallin Note that the cranelift-jit
crate has similar code paths, though currently they appear to fail silently.
I guess I have met the same issue. The error message is PermissionDenied
, but the test is running in docker by root.
Wasmtime version or commit: 0.35.3
Operating system: CentOS Linux release 7.9.2009 (AltArch)
Architecture: aarch64
kernel: Linux 7867c1d5ae55 4.18.0-348.20.1.el7.aarch64 #1 SMP Wed Apr 13 20:57:50 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
Running 1 test case...
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 1, kind: PermissionDenied, message: "Operation not permitted" }', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/wasmtime-jit-0.35.3/src/code_memory.rs:63:14
stack backtrace:
0: 0xb058d8 - std::backtrace_rs::backtrace::libunwind::trace::h8a5e4a04f7e58fc7
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
1: 0xb058d8 - std::backtrace_rs::backtrace::trace_unsynchronized::hc9931ed5b94829c1
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0xb058d8 - std::sys_common::backtrace::_print_fmt::h93cac7fc90870819
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/std/src/sys_common/backtrace.rs:66:5
3: 0xb058d8 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hb80d1ce809872875
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/std/src/sys_common/backtrace.rs:45:22
4: 0xb571d8 - core::fmt::write::h178b28c8855699fc
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/core/src/fmt/mod.rs:1198:17
5: 0xaf7c4c - std::io::Write::write_fmt::h9bb27fefa849e7e9
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/std/src/io/mod.rs:1672:15
6: 0xb08218 - std::sys_common::backtrace::_print::h39936b176af986e7
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/std/src/sys_common/backtrace.rs:48:5
7: 0xb08218 - std::sys_common::backtrace::print::h326caf59757e2648
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/std/src/sys_common/backtrace.rs:35:9
8: 0xb08218 - std::panicking::default_hook::{{closure}}::hf783bfb93990b280
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/std/src/panicking.rs:295:22
9: 0xb07f2c - std::panicking::default_hook::hba8a0c46968da93d
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/std/src/panicking.rs:314:9
10: 0xb08948 - std::panicking::rust_panic_with_hook::h7a13c9edc6d8d37a
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/std/src/panicking.rs:698:17
11: 0xb08830 - std::panicking::begin_panic_handler::{{closure}}::h488b160e52dfebaf
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/std/src/panicking.rs:588:13
12: 0xb05de8 - std::sys_common::backtrace::__rust_end_short_backtrace::h9348312e0595d087
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/std/src/sys_common/backtrace.rs:138:18
13: 0xb0857c - rust_begin_unwind
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/std/src/panicking.rs:584:5
14: 0x404108 - core::panicking::panic_fmt::h8466cbb0f1c51a1e
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/core/src/panicking.rs:142:14
15: 0x4041f4 - core::result::unwrap_failed::he9d64115a287a1c2
at /rustc/2643b16468fda787470340890212591d8bc832b7/library/core/src/result.rs:1814:5
16: 0xfdc284 - wasmtime_jit::instantiate::CompiledModule::from_artifacts::h983421f6fde1119e
17: 0xc97b40 - <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold::hb87417df58489414
18: 0xcb578c - rayon::iter::plumbing::Producer::fold_with::h4fc4a166544a14c4
19: 0xd03744 - rayon::iter::plumbing::bridge_producer_consumer::helper::h7d219b477f38f004
20: 0xcb7520 - <rayon::vec::IntoIter<T> as rayon::iter::IndexedParallelIterator>::with_producer::h4be24b5438d94693
21: 0xcb6a5c - <rayon::iter::map::Map<I,F> as rayon::iter::ParallelIterator>::drive_unindexed::haf000ffd9729773f
22: 0xd1f048 - rayon::iter::collect::<impl rayon::iter::ParallelExtend<T> for alloc::vec::Vec<T>>::par_extend::hd53d53241b744d71
23: 0xd27a94 - rayon::result::<impl rayon::iter::FromParallelIterator<core::result::Result<T,E>> for core::result::Result<C,E>>::from_par_iter::he901005655a8d72d
24: 0xcccaf0 - wasmtime::module::Module::from_binary::h5c02d969de2d9338
I'm not sure that is the same issue, you have kernel 4.18 which should support this! Or fail with code 22
if it isn't implemented on AArch64.
However I did a quick google search and it might be that the membarrier
syscall isn't whitelisted on docker, can you check if that's whats happening?
Edit: And this link mentions that it was only added to the default list in docker 20.10.
@afonso360 You are right, I found the docker engine version is 1.13. Upgrade docker to 20.10 and the test is fine. Thank you very much!