nix icon indicating copy to clipboard operation
nix copied to clipboard

`sched::clone` doesn't pass ownership of closure to new thread.

Open xurtis opened this issue 6 years ago • 10 comments

The current implementation of of sched::clone doesn't pass ownership of the the closure called by callback into the newly created thread, only a reference. This means that when the call to clone ends, the callback is actually dropped, releasing all associated memory. This most likely means that the closure that actually gets run in the child thread is a reference to invalid memory when it is run.

Recommended fix is to unbox the CloneCb into a raw pointer and box it back up again in callback. This would mean that the closure would be dropped at the end of the callback in the child thread, rather than at the end of the call to clone.

xurtis avatar Jul 04 '18 05:07 xurtis

I'll have a patch in a few minutes.

xurtis avatar Jul 04 '18 05:07 xurtis

Without https://github.com/rust-lang/rust/issues/28796 being stabilised, sched::clone shouldn't support actual closures at all. It makes no sense for clone to accept a closure with references as the child thread then assumes that everything being referenced lives longer than itself and the parent thread assumes that all borrows that the child thread had were dropped when the call to clone succeeded.

In light of this, I think I'll make a patch to change clone to accept something more like what the syscall does now (i.e. fn(T) -> isize with a T passed into clone.

xurtis avatar Jul 04 '18 06:07 xurtis

This is still a problem, and makes clone completely broken (rather than the expected horribly unsafe because it's clone). FnBox has been fixed for a while, though it would be a breaking change to improve CloneCb to be Box<FnOnce()> instead of Box<FnMut()>. That said, fixing the other correctness issues there to make it Box<FnMut() -> isize + Send + 'static> instead would also be breaking, so might as well make it FnOnce() at the same time.

On top of that &mut [u8] is not a great choice for the stack type, as you need the borrow for up to 'static, not just for the duration of clone(), and the most obvious ways to allocate [u8; stack size] will break in practice.

talchas avatar Jul 28 '21 06:07 talchas

@talchas would you like to try rewriting clone? I wouldn't worry about preserving backwards-compatibility; it probably can't be done in a way that also fixes clone's problems.

asomers avatar Aug 09 '21 21:08 asomers

I'd go with

    pub fn clone<F: FnOnce() -> c_int + Send + 'static>(
        cb: F,
        stack: *mut [u8],
        flags: CloneFlags,
        signal: Option<c_int>,
    ) -> Result<Pid> {
        extern "C" fn callback<F: FnOnce() -> c_int + Send + 'static>(data: *mut c_void) -> c_int {
            let cb: Box<F> = unsafe { Box::from_raw(data as *mut F) };
            (*cb)()
        }

        let cb = Box::into_raw(Box::new(cb));
        let res = unsafe {
            let combined = flags.bits() | signal.unwrap_or(0);
            let ptr = (stack as *mut u8).add((*stack).len());
            let ptr_aligned = ptr.sub(ptr as usize % 16);
            libc::clone(
                callback::<F>,
                ptr_aligned as *mut c_void,
                combined,
                cb as *mut c_void,
            )
        };
        let res = Errno::result(res).map(Pid::from_raw);

        // In CLONE_VM we need to free the callback on both sides
        if flags.contains(CloneFlags::CLONE_VM) || res.is_err() {
            unsafe { Box::from_raw(cb); }
        }

        res
    }

but there's a lot of fairly reasonable bikeshed options: taking *mut F (and probably having users just leak it or do it wrong) or Box<FnOnce() -> c_int + Send + 'static>, maybe taking Box<[u8]> for stack. Then there's the option of not doing closures at all since they're still kinda sketchy in CLONE_VM (anything calling into the allocator in the new mostly-process is), though then the benefit vs libc is small.

talchas avatar Aug 09 '21 23:08 talchas

  • Taking raw pointers isn't much better than having the user call FFI functions directly. Could stack be a &'static mut [u8] instead? Granted, it would probably have to be unsafely created. Or if you take Box<[u8]> for the stack, who would be responsible for deallocating it, the parent, the child, or both?
  • What's the point of taking a closure argument if it must be 'static? Is that any different than taking a plain function pointer? If there is a good reason to use a closure argument, then your Box::into_raw logic looks correct.
  • I think your logic for CLONE_VM is backwards. The parent should free the callback, unless that flag is set.
  • Finally, this function should definitely be unsafe.

asomers avatar Aug 11 '21 00:08 asomers

Yeah, I didn't do &'static because it would need unsafe anyways, and didn't do Box<[]> because of the freeing issues, and statically allocating a buffer is maybe useful sometimes.

You still want a closure, even if it's 'static; you can move in a String/ints/anything else owned still.

Yes, I forgot which way around CLONE_VM worked, and making it unsafe is probably good too.

talchas avatar Aug 11 '21 00:08 talchas

Hm. The child process/thread thing will never free stack, and unless CLONE_VM is set then the parent can do whatever it wants with the stack after return. So maybe it doesn't need to be 'static. Would it be better to make stack a &mut [u8], with a note in the documentation that it must outlive the child process if CLONE_VM is set? Or should we remove CLONE_VM from CloneFlags, and create a separate clone_vm function that requires a static array for stack?

asomers avatar Aug 11 '21 00:08 asomers

Having a separate CLONE_VM certainly could be reasonable, as the mechanics around memory safety and lifetimes are completely different. For non-VM ones you can drop the 'static requirement, use &mut [u8], and use &mut cb rather than boxing it (but that one definitely should be unsafe for similar reasons to CommandExt::pre_exec).

Looking more at clone(2), I guess there's also the issue that all the SETTID/CLEARTID/PIDFD stuff causes it to write to the varargs slots which there is no way to fill in, so setting those flags has no use other than being UB. Similarly, SETTLS can't be specified, and I guess would be reason enough to make clone_vm() be unsafe as well, since tls is assumed to be set up properly for safe code.

talchas avatar Aug 11 '21 01:08 talchas

Yeah, good catch. You should remove those flags too, or create separate functions to enable their use.

asomers avatar Aug 11 '21 02:08 asomers