nix
nix copied to clipboard
Closure transfer fix for `sched::clone`
Fixes #919
Annotate the correct lifetime ('static
) for the closure transferred to
the child process of the clone.
Properly box the closure in the parent then rebox it in the child to
ensure that the closure is dropped at the end of the invocation in the
child rather than at the end of the call to clone
in the parent.
For some reason, unwrapping the boxed function itself caused all kinds of mayhem, so it instead ends up wrapped in an additional box from which it is then unwrapped.
After looking at how std::thread
achieves the same result, I changed the patch to mimic that instead of doing strange dances with boxes. The (much simpler) solution turns out to be just forgetting the boxed function rather than dropping it.
It also appears that even at its lowest level, in order to implement the FnOnce semantics, libstd::thread
uses a Box<Box<FnBox()>>
which it then unwraps, sends, then boxes again before calling. Still not sure on the why though.
I think it would make more sense to require the callback to be a FnOnce, but I can't see a way of doing that in stable currently.
I'm skeptical of the mem::forget
. That function deliberately leaks memory, and is normally used when you've made a low-level (unsafe) copy of a reference somewhere. But normally at least one copy gets destroyed, and I don't see that happening. Is it really necessary to forget
the callback, when it's already static
?
In this case the 'static
refers to the fact that the function that the box wraps is not bound to the lifetime of any other object (i.e. all items that it refers to have been moved into it). It does not mean that the boxed closure is static.
The existing implementation of clones as you have ti will only work when the child thread does not use the CLONE_VM
flag, meaning it gets its own entire copy of the VM. The reason it doesn't work is because the boxed closure will be dropped by the end of the call to clone, as it is moved into that function and that function consumes it. There is only one copy to this box and the child thread gets a reference to it, but if that box is dropped be for the child thread exits, then the child thread could rely entirely on invalid memory and, in most cases, won't even get to the point of executing the closure.
Either, the box needs to be copied (which it can't) or have its ownership transferred to the child process, or forgotten and never dropped at all. The only to transfer ownership to the child process involves wrapping a the boxed closure in an additional box (I'm still not entirely clear on the why there).
Another alternative fix that could be used here is raising that concern to the caller and using a more direct analogous type to what the syscall is actually expecting and have the wrapper take a fn(T) -> isize
(a function pointer rather than a boxed closure) and a Box<T>
(where T: Send + 'static
) instead. This would leave more abstract implementations of executing closures to caller instead, whcih may be preferable in this case.
I agree. But I'm not comfortable with the double-box solution without knowing why it's necessary. Did you check the history of the file in libstd
that uses that technique? That might be instructive.
So, on your recommendation, I'm going back through the years of commits to https://github.com/rust-lang/rust/blob/master/src/libstd/sys/unix/thread.rs and it goes back quite some way through pre-stable rust (back before ~
notation for boxes was a thing).
I asked around on #rust
and got some help answering the question. The short version is, don't cast between pointers of static and DSTs. DST pointers are actually 'fat' pointers which are a tuple of a pointer to data and a pointer to a vtable. When those get cast to a pointer to a void, the vtable data is dropped. To resolve this, an additional box should be used, so that you have a pointer to a static type (which contains a DST pointer) that can be safely casted to pointer to a static type, like *mut c_void
.
The existing code masked a fatal compile error that would have caught this using a mem::transmute
.
I've fixed it up in a way that should disrupt the interface as little as possible using the double boxing. I've also noticed that the way the stack is passed into clone as a &mut [u8]
is troublesome as that could mean that child thread outlives the memory for its own stack. Fixing that would break the interface.
Linux tests that aren't x86_64 or i386 get executed in QEMU, which has some limitations. You should research if there are any known problems using clone(2) in QEMU. If so, you'll have to disable those tests except for i386 and x86_64.
I don't think this is an issue with QEMU though. On arm (which requires stack alignment), the musl implementation (which fixes the alignment rather than returning an error) works, but the glibc implementations (which passes the value unaltered to the syscall) fails.
Any news on this?
I think we can merge it as soon as we figure out why it fails in QEMU. If anybody has actual arm, mips, or ppc hardware running Linux that would make a good point of comparison.
I suspect that there is a deeper issue in that there is fundamental unsafety in the clone
API and that a direct safe wrapper cannot be provided.
It probably makes more sense to break the API out into several different functions that are actually and that can express their full contract in their type signature.
(The particular issue is that the handling of memory for all of the arguments passed in depends on the mapping properties of that memory and the flags that manage the address space flags of clone.)
Anyone still working on this?