glommio
glommio copied to clipboard
Issues with Docker on M1
I think there might be some issues running glommio in docker on the M1 machines... I've been having some trouble with timers and sockets, and after stripping things down to their most barebones (examples from the docs), I'm still getting incorrect results. Please correct me if I'm missing something here, I'm nowhere close to a glommio expert.
For example:
use glommio::{Latency, LocalExecutor, LocalExecutorBuilder, Placement, Shares};
use std::time::Duration;
fn main() {
let local_ex = LocalExecutor::default();
let res = local_ex.run(async move {
let task_queue = glommio::executor().create_task_queue(
Shares::default(),
Latency::Matters(Duration::from_secs(1)),
"my_tq",
);
let task = glommio::spawn_local_into(
async {
println!("Hello world");
},
task_queue,
)
.expect("failed to spawn task");
});
}
Gives me (cargo run --bin glomtest
):
warning: unused imports: `LocalExecutorBuilder`, `Placement`
--> src/glomtest/main.rs:1:39
|
1 | use glommio::{Latency, LocalExecutor, LocalExecutorBuilder, Placement, Shares};
| ^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
warning: unused variable: `res`
--> src/glomtest/main.rs:11:9
|
11 | let res = local_ex.run(async move {
| ^^^ help: if this is intentional, prefix it with an underscore: `_res`
|
= note: `#[warn(unused_variables)]` on by default
warning: unused variable: `task`
--> src/glomtest/main.rs:17:13
|
17 | let task = glommio::spawn_local_into(
| ^^^^ help: if this is intentional, prefix it with an underscore: `_task`
warning: `asyncrs` (bin "glomtest") generated 3 warnings
Finished dev [unoptimized + debuginfo] target(s) in 43.69s
Running `target/debug/glomtest`
As you can see, no output.
Trying to use a pinned thread also does not work:
use glommio::{Latency, LocalExecutor, LocalExecutorBuilder, Placement, Shares};
use std::time::Duration;
fn main() {
let mut threads = Vec::new();
threads.push(
LocalExecutorBuilder::new(Placement::Fixed(1))
.spawn(|| async move {
let task_queue = glommio::executor().create_task_queue(
Shares::default(),
Latency::Matters(Duration::from_secs(1)),
"my_tq",
);
let task = glommio::spawn_local_into(
async {
println!("Hello world");
},
task_queue,
)
.expect("failed to spawn task");
})
.unwrap(),
);
for handle in threads {
handle.join().unwrap();
}
}
Output:
warning: unused import: `LocalExecutor`
--> src/glomtest/main.rs:1:24
|
1 | use glommio::{Latency, LocalExecutor, LocalExecutorBuilder, Placement, Shares};
| ^^^^^^^^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
warning: unused variable: `task`
--> src/glomtest/main.rs:17:21
|
17 | let task = glommio::spawn_local_into(
| ^^^^ help: if this is intentional, prefix it with an underscore: `_task`
|
= note: `#[warn(unused_variables)]` on by default
warning: `asyncrs` (bin "glomtest") generated 2 warnings
Finished dev [unoptimized + debuginfo] target(s) in 45.75s
Running `target/debug/glomtest`
Using FROM rustlang/rust:nightly
Edit:
➜ docker-compose run --rm builder uname -sr
Linux 5.10.124-linuxkit
Again, I could be missing something here but I feel like either the documentation is incorrect, or docker is doing something funky
I've got an M1 (M2, but hopefully the same) and will try to reproduce.
tl;dr: There's a small issue with your code, but running on docker presented some challenges, and I don't know if you even got this far. The issue in the code doesn't lead the code to just hang, so maybe you hit an earlier issue. I am adding my full exploration below. Once you get this running, please let me know what exactly was your issue so we can improve the docs.
Docker issues
First issue I hit is just the ulimit thing. Docker comes with a very low ulimit for memlock, so I just added a big number through docker's command line for testing:
docker run -it --ulimit memlock=1024000 --mount type=bind,src=/Users/glaubercosta/glommio,target=/glommio <img>
Then I hit an issue of too old kernel:
thread 'main' panicked at 'Failed to register a probe. The most likely reason is that your kernel witnessed Romulus killing Remus (too old!! kernel should be at least 5.8)', glommio/src/sys/uring.rs:220:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Note that while we say 5.8 minimum, unfortunately I'm guilty of architecture favoritism here... that's from x86 development and aarch64 could be different.
What that message means is that the following operation failed:
uring_sys::io_uring_get_probe()
that is what io_uring uses to even advertises its features, so if that fails, we can't know which features are available and just bail.
I had run glommio on AWS Graviton before, so my guess here is that probes made it into a later kernel for aarch64. Sometimes that happens and support for some arches lag behind.
In fact I tried to run the probe
test from liburing
: https://github.com/axboe/liburing/blob/master/test/probe.c
And that failed:
root@6dcff684744c:/glommio/glommio/liburing# ./test/probe
ring setup failed
Now, what's surprising about that message, is that looking at the source code of the test, it fails at ring creation. If that doesn't work, nothing will. And it is extremely unlikely that even ring creation wasn't available on 5.10 for aarch64.
So I tried an even simpler test, ./test/nop
that really just sets up the ring:
glommio/liburing# ./test/nop
ring setup failed: -12
Good! Now we have an error code.. and ENOMEM??
Let's start with the hammer, and then tailor it down. I added the --privileged
flag.
Now the tests both pass:
glommio/liburing# ./test/nop && echo $?
0
# ./test/probe && echo $?
0
--privileged
is too hard of a hammer, but it does indicate that this is a Linux capability issue. Reading through, I edited the docker run command to be:
docker run -it --ulimit memlock=1024000 --cap-add cap_ipc_lock --mount type=bind,src=/Users/glaubercosta/glommio,target=/glommio <img>
and the tests keep working. Victory!
Code issues
Does glomtest
pass ? Well, it returns, but it doesn't print the Hello World
.
The reason is the following:
spawning a task is an asynchronous operation. The task is executed at a later time, and there is no guarantee that the scheduler will switch to it immediately.
So the task is spawned, and the scheduler decides to keep executing the current task. Except that the current task has nothing else to do, and it is the main task within run
. So the Executor terminates, and your task terminates with it.
Changing the code to:
let task = glommio::spawn_local_into(
async {
println!("Hello world");
},
task_queue,
)
.expect("failed to spawn task");
task.await; <=== this
makes it work. Because now we have a hard guarantee that the task will execute: the main task blocks on that await
, and with nothing else to run we switch to the spawned task.
Finished dev [unoptimized + debuginfo] target(s) in 33.27s
Running `target/debug/examples/glomtest`
Hello world
Awesome, thank you for the detailed response and for explaining everything out. Interestingly enough I didn't have to do as much docker tweaking as you did to get things running, just the ulimit.
@pbdeuchler based on the above, what are the parts that you think would be best for us to add to readme and/or docs for a smoother experience ?
The whole thing sounds a bit too much =(
Hey @glommer sorry for the super late reply, sometimes my work github notifications drown things out. I probably should have paid more attention to the examples, I think it would have made things easier. I was also just getting started with rust async at the time, so maybe just a brain block while learning a new concept. As I've learned more about glommio and io_uring I do have some more questions about idiomatic glommio code... I'll drop into the Zulip and ask them when I get a moment