rutie icon indicating copy to clipboard operation
rutie copied to clipboard

Premature `free` when Rust holds Ruby object's in a `Vec`

Open ZhangHanDong opened this issue 7 years ago • 20 comments

Related issues:

If we: heap allocate a struct-based helix object (e.g. Duration::new) GC runs It will GC the Ruby object associated with the helix object, because the GC couldn't see the VALUE pointer on the stack

ZhangHanDong avatar Jul 18 '18 16:07 ZhangHanDong

The specific error I opened in Ruru that you linked to was because I use Vec rather than an array [] for method arguments sent to Ruby. The appropriate link explaining that for this project is https://github.com/danielpclark/rutie/issues/3#issuecomment-401515443 and has been resolved thanks to @irxground :+1: . The README#segfault-during-gc-when-using-a-ruby-method-written-in-c has a troubleshooting guide for when this occurs now.

I'll close the issue in Ruru as it's a user related error (my fault) rather than a Ruru one.

If Helix happens to use heap allocation for Ruby objects that the Ruby GC looks for then there might be an occasional GC issue with that. It's safer to use Rust's native array type when passing Ruby objects to Ruby's C code (rather than Vec) to avoid this.

I hope that was helpful. :bowing_man:

danielpclark avatar Jul 18 '18 19:07 danielpclark

@danielpclark thanks

ZhangHanDong avatar Jul 19 '18 03:07 ZhangHanDong

@ZhangHanDong This issue is being revisited/discussed here: https://github.com/danielpclark/rutie/pull/76#issuecomment-454582628 Things may not be as we originally thought. But we need to prove it.

danielpclark avatar Jan 16 '19 15:01 danielpclark

This issue is being revisited/discussed here: #76 (comment) Things may not be as we originally thought

I believe it is the vec! that caused the segfault.

  • When collecting garbage, ruby GC will scan stack to determine objects who is unreachable. Unreachable objects will be collected.
  • Any ruby C API that allocating memory from GC has an opportunity to trigger a GC collecting.

Let's see your orignal code:

Class::from_existing("Pathname").new_instance(
    Some(&vec![RString::new(path).to_any_object()])
)

It roughly equal to:

let args = &vec![RString::new(path).to_any_object()];
// the ruby object in args is not reachable from stack from the GC's point of view at this time.

let RPathname = Class::from_existing("Pathname"); // this may or may not alloc depending on it's implementation I never know.
RPathname.new_instance(Some(args)) // This call will definetely alloc memory first for the new object, then pass the args to the constructor of Patchname.

When the RPathname.new_instance allocating memory, it has an opportunity to trigger GC collecting which will collect the objects in the vector args. That's cause dangling pointers.

kvinwang avatar Jan 17 '19 01:01 kvinwang

Thanks @kvinwang . cc @dsander

danielpclark avatar Jan 17 '19 02:01 danielpclark

I think we can make a better solution by leveraging Rust's traits system. Define something like:

trait ToRubyValue {
    fn to_ruby_value(&self) -> RValue;
}

Then the caller side only needs to create pure rust heap objects. All converting codes will be limited in a controllable range in the framework.

Edit: In this way, the caller won't worry about putting objects on the stack or heap.

kvinwang avatar Jan 17 '19 02:01 kvinwang

After thinking a while, I suggest reopening this issue. In rutie, there are tons of Rust APIs creating Ruby objects and rely on the created objects to be put on the stack to ensure memory safety. In the Rust world, unsafe codes have the responsibility to guarantee safe codes to be 100% memory safe. Rutie should not rely on the caller to put those objects on the stack.

See: https://doc.rust-lang.org/nomicon/working-with-unsafe.html

kvinwang avatar Jan 18 '19 01:01 kvinwang

After thinking a while, I suggest reopening this issue.

I agree, I'll copy over a some of the discussion we had in #76.

We have a pretty simple reproduction of a GC issue here: https://github.com/dsander/rutie/commit/b752b1deec17acb7eedb03e782741db5656eb7b3#diff-aa50bf5ef2d5964c7306bee65070231c

This segfault every run with the default GC configuration on ruby 2.5. Calling GC.disable in the ruby script "fixes" the issue, so I am relatively certain that rubys GC is causing the problem an not rust by dropping values.

I tried to a similar solution nokogiri used to fix a segfault https://github.com/sparklemotion/nokogiri/commit/0a1556a7c0ac1dc58c074a06f639bba3ddabab3f here https://github.com/dsander/rutie/commit/fed4275fac200603cb66232465deacacddb59294#diff-aa50bf5ef2d5964c7306bee65070231c but it does not work.

https://www.ruby-forum.com/t/rb-gc-register-address-or-rb-gc-mark/219828/13

rb_gc_register_address is for use with fixed memory addresses like: static VALUE string_aaa; Using it on pointers to the heap or stack is going to result in undefined behavior.

If that is true I don't think we can use rb_gc_register_address in our rust code.

https://github.com/sparklemotion/sqlite3-ruby/pull/172/files also looks like a similar to the issue we are having. I tried putting the contents of the arguments slice in VM::call_method in a ruby Array but that did not help either.

Is there a way to definitely put variables on the stack in rust apart from lazy_static?

dsander avatar Jan 20 '19 09:01 dsander

I tried putting the contents of the arguments slice in VM::call_method in a ruby Array but that did not help either

How did you do this?

Is there a way to definitely put variables on the stack in rust apart from lazy_static ?

I don't think there is a way in rust to constrain a type to only allow it's instances to be put on the stack.

The only way in my mind is avoiding create real ruby objects in rust. Instead, we should create pure rust objects which can be lazily converted into ruby objects. The ruby APIs which change ownership of ruby objects(e.g. Array.pop) also must be marked as unsafe and doc on its return value must be put on the stack or it may introduce detached objects.

kvinwang avatar Jan 20 '19 11:01 kvinwang

In rutie, there are tons of Rust APIs creating Ruby objects and rely on the created objects to be put on the stack to ensure memory safety. In the Rust world, unsafe codes have the responsibility to guarantee safe codes to be 100% memory safe.

The ruby APIs which change ownership of ruby objects(e.g. Array.pop) also must be marked as unsafe and doc on its return value must be put on the stack or it may introduce detached objects.

@kvinwang I believe you are mostly correct. There is more to unsafe than just the possibility for panic though. We are probably in agreement on this but I want to write it out to clarify the stance on using unsafe.

Rutie should not rely on the caller to put those objects on the stack.

Correct.

unsafe

There are two rules for unsafe safety.

  1. There is a risk of panic
  2. It is the users responsibility that when they use any unsafe methods that they must ensure the safety themselves

So in everything you've said I agree with you in principle on #1, but I don't think that the issue we're having of ownership between Rust & Ruby and a double free occurring is the “users responsibility” rather I believe that should be both “Rutie and the developers of Rutie's responsibility” to ensure ownership rules and lifetime rules are adhered to.

Since it's Rutie's responsibility to expose an API for using Ruby that ensures proper ownership/lifetime/memory-freeing code I don't think any of the methods that fail to do this should be marked unsafe, even though technically they are, because it's not the user of Rutie's responsibility to make it safe. The time for ensuring safety happens both within the instantiation of a Ruby object in Rust with ::new methods and the time Drop gets called. So we are responsible for making Rutie safe in object creation and destruction and it's not anyone else's responsibility.

Methods that are to be marked unsafe are methods like GC::is_marked, which is unsafe for sure, and only intended to be used in debugging/testing by users who must ensure the safety by verifying the GC is not running it's sweep action as the method would then be unusable and cause UB. This is the purpose of unsafe, that they are responsible for the safety.

So because of #2 I don't think we should, and am against, a rewrite of everything in Rust to simply have unsafe on the front of it because of Vec behavior. The truth is it's not their responsibility, it's ours.

In the meantime it's much easier to simply ask users to not use Vec until we get the code to be compatible with it.

danielpclark avatar Jan 20 '19 17:01 danielpclark

@dsander Is it possible to reproduce the bug purely from the Rust side now that we have GC methods? We can create a test directory for having a unit test for specific bugs. For now the test should assert that it does panic. And it will be much faster to try to solve the issue in Rust with testing against cargo test.

I'm making this thread the official issue number for the Vec issue. When we have a unit test for this we can label the test file test/issue_21_bug_test.rs. And in commit messages we should tag the end of the commit - #21 to refer to this issue whenever it is applicable. That will help track the issue here more clearly.

We don't need to copy any more of the #76 comments here. It's fine to leave that there and as is.

danielpclark avatar Jan 20 '19 18:01 danielpclark

GC/Drop Idea

Just an idea. Since Ruby instantiates object's memory through the GC and cleans up objects from memory through the GC I think we could implement a Drop feature in Rust that first checks if Ruby's GC knows about the item and is tracking it. If it is then we can simply have Rust forget about it with std::mem::forget and just let the GC clean the memory when it's ready to. If the GC doesn't have the object referenced in it then we can allow Rust to continue to free it as it normally would.

danielpclark avatar Jan 20 '19 22:01 danielpclark

When I sad this:

The ruby APIs which change ownership of ruby objects(e.g. Array.pop) also must be marked as unsafe

I meant it has the precondition that we solve the issue in this way:

The only way in my mind is avoiding create real ruby objects in rust. Instead, we should create pure rust objects which can be lazily converted into ruby objects.

Because those API may cause objects unreachable from GC but reachable from rust to come out and cause UB subsequently, So they must be marked as unsafe.

The truth is it's not their responsibility, it's ours.

I agree. If rutie can manage the lifetime/ownership of those objects well, it definitely doesn't need to mark them as unsafe. But I doubt we can do it easily. There isn’t a clear/simple way in my mind to manage those well. We can not simply Drop/free those objects。 So, If we can not find a way to do it well, we must mark those API as unsafe. Since rutie serves as a rust crate, It must obey all safety rules defined in rust.

1. There is a risk of panic

Did you mean panic by crash? They are different things in rust. unsafe mostly means things will cause UB. panic is a defined thing in rust. In our issue, the dangling pointer and double free are unsafe things. Whether crashing or not is not important.


GC/Drop Idea

Just an idea. Since Ruby instantiates object's memory through the GC and cleans up objects from memory through the GC I think we could implement a Drop feature in Rust that first checks if Ruby's GC knows about the item and is tracking it. If it is then we can simply have Rust forget about it with std::mem::forget and just let the GC clean the memory when it's ready to. If the GC doesn't have the object referenced in it then we can allow Rust to continue to free it as it normally would.

I don't think this is a good idea:

  • It is expensive to check if GC knows about an item.
  • We are not able to prevent the item from being freed by GC in this way. (It doesn't solve the original issue)
  • When the item is passed to ruby from rust and then be passed back to rust again, things become complicated。

kvinwang avatar Jan 21 '19 02:01 kvinwang

How did you do this?

The latest version even a static global array, but it does not change anything: https://github.com/dsander/rutie/commit/44b5028f9a2563acbdba240d1b6794a432391620

The only way in my mind is avoiding create real ruby objects in rust. Instead, we should create pure rust objects which can be lazily converted into ruby objects.

Does it matter at which point we convert the Rust objects to Ruby objects? As far as I understand the issue the segfault happens because the Ruby GC does not know the Ruby objects we created in Rust are in use and then frees them.

@dsander Is it possible to reproduce the bug purely from the Rust side now that we have GC methods?

I have not tried that, I think it will depend on the way how the Ruby VM works when it's started from Rust.

GC/Drop Idea

Not sure if we even need to worry about that, in Rust the wrapped Ruby objects are just a pointer to the Value. Shouln't dropping the pointer be fine? I am assuming that dropping the pointer does not mess with the data Ruby wrote at the destination of the pointer.

dsander avatar Jan 21 '19 09:01 dsander

I don't think this is a good idea:

  • It is expensive to check if GC knows about an item.
  • We are not able to prevent the item from being freed by GC in this way. (It doesn't solve the original issue)
  • When the item is passed to ruby from rust and then be passed back to rust again, things become complicated。

I believe you are correct.

GC/Drop Idea

Not sure if we even need to worry about that, in Rust the wrapped Ruby objects are just a pointer to the Value. Shouldn't dropping the pointer be fine? I am assuming that dropping the pointer does not mess with the data Ruby wrote at the destination of the pointer.

I believe you are correct here as well.

danielpclark avatar Jan 21 '19 12:01 danielpclark

The latest version even a static global array

The Array should be on the stack, not on the static section. As you have mentioned, If it is put on the static, it is needed to manually register it's address into GC.

Does it matter at which point we convert the Rust objects to Ruby objects?

I mean in that way, we can prevent the ruby objects to be exposed to the user side. Since they must be put on the stack, if they are exposed to the user, things will be out of control.

I had thought about another idea (pseudocode):


mod rroot {
    
    pub struct Root {
        arr: Array // a ruby array
    }
    impl Root {
    
        fn new() -> Root {
        // keep this function private so that users cannot create a Root instance explicitly.
        // So we can ensure that there is no chance to put it on the heap.
            Root { arr: Array::new() }
        }

        pub fn refer(&self, value: Value) {
            self.arr.push(value);
        }
    }
    
    pub fn with_root(blk: impl FnOnce(&Root)) {// expose this function to the user is safe
        let root = Root::new(); // This is the only point creating Root instance and it is always on the stack.
        blk(&root);
    }
}


fn main() {
    use rroot::with_root;
    
    with_root(|root| {
        let pathname = Class::from_existing("Pathname").new_instance(Some(
        &vec![
            // Any API which creating ruby objects need to be changed to accept a Root instance.
            RString::new_utf8(
                root/* Inside new_utf8, we let root refer to the new RString instance, so that it is reachable from stack */,
                path
            ).to_any_object()
            ]));
    });
}

But it seems very tedious.

kvinwang avatar Jan 21 '19 12:01 kvinwang

@kvinwang I'm not sure we want to avoid the heap because of this issue. I'd like to be able to use a Vec.

If the program running is a Rust program we always start the VM with VM::init and when it's a Ruby program the VM is running. All Value objects are created in Ruby and yes it is held in Rust via a pointer:

#[repr(C)]
#[derive(Copy, Clone, Debug, PartialEq)]
pub struct Value {
    pub value: InternalValue,
}
pub type InternalValue = uintptr_t;

Some types we should never have issue with is Integer/Fixnum because both Rust implements Copy on numeric types and Ruby does not GC those types. Also I believe Symbol in Ruby is not GC'd either.

The problem we are able to reproduce is with Ruby strings created from Rust. Think about how we're going about it. We produce a native String in Rust and give Ruby access to it by giving it a pointer to the start and the length of the string. Rust sill owns it, but now Ruby owns it too. I think at this point we should let Rust std::mem::forget the native Rust String type as Ruby now holds that memory for the Ruby String within its own VALUE object. This is my best guess at a fix for the particular issue we are able to reproduce.

I think the problem lies in the fact that when the scope of the Rust code closes then it calls free on the Rust String originally given to Ruby via a pointer into Rust and Ruby is still using it. Although why this happens in a Vec and not an array is still confusing to me. ¯\_(ツ)_/¯

danielpclark avatar Jan 21 '19 15:01 danielpclark

Although why this happens in a Vec and not an array is still confusing to me. ¯\_(ツ)_/¯

Maybe because Ruby does indeed store its objects on the heap and so when it's on the stack from Rust it doesn't free the stack via GC (because it possibly doesn't look there) but Rust does, but when it's on the heap it sees it the same as its own objects and tries to free something that Rust has already freed from the heap.

On second thought this doesn't make sense. It would mean that even though we're putting RString objects in a Vec that the compiler would infer the String used in RString::new_ would need to be moved from stack to heap… which then also means the pointer given to Ruby would no longer be correct.

danielpclark avatar Jan 21 '19 15:01 danielpclark

The Array should be on the stack, not on the static section. As you have mentioned, If it is put on the static, it is needed to manually register it's address into GC.

Thanks for the hint, I tried adding GC::register(&*GUARD.lock().unwrap()); but that didn't help.

Disabling the GC for the duration of the call of the Ruby method works, but that isn't a real solution 😄

I had thought about another idea (pseudocode):

That looks interesting, can you explain how this works? I don't understand why it would put the Root instances on the stack.

Also I believe Symbol in Ruby is not GC'd either.

I believe that as well, as every given symbol is only created once, GCing them would probably cost more than it saves.

I think at this point we should let Rust std::mem::forget the native Rust String type as Ruby now holds that memory for the Ruby String within its own VALUE object.

I am pretty sure that Rust dropping memory isn't the source of the segfault we are seeing (even though it might cause other problems down the road). The reproduction segfaults during this iteration, the first few hundred strings are printed to STDOUT and after the GC ran it segfaults because the contents of args have been GC'ed. Disabling the GC in Ruby (or Rust) allows the program to finish.

Although why this happens in a Vec and not an array is still confusing to me. ¯_(ツ)_/¯

I already forgot where I read it, but the source stated that the Ruby GC will look through the programs stack and mark (i.e. not GC) every memory location that "looks" like a Ruby VALUE. So every VALUE (or references to them?) we put on the stack are not GC'ed. As far as I know Rust puts arrays on the stack because the size is known at compile time.

dsander avatar Jan 21 '19 20:01 dsander

Rust sill owns it, but now Ruby owns it too

No, I believe the RString::new_utf8 will copy the contents, otherwise, it won't work anywhere.


Although why this happens in a Vec and not an array is still confusing to me. ¯_(ツ)_/¯

My first comment explained what was happening.


I tried adding GC::register(&*GUARD.lock().unwrap()); but that didn't help.

There is a bug in the binding:

pub fn register(obj: Value) {//should be 'obj: &Value'
    let addr = &obj as *const _ as CallbackPtr;

    unsafe { gc::rb_gc_register_address(addr) }
}

The obj is always copied into the fn register, then &obj will always get a local address in the stack frame.


That looks interesting, can you explain how this works? I don't understand why it would put the Root instances on the stack.

The rust compiler will put everything on the stack except there are some codes explicitly allocating heap memory.

But the vec is on the heap? Because the Vec has a pointer inside:

pub struct RawVec<T, A: Alloc = Global> {
    ptr: Unique<T>,
    cap: usize,
    a: A,
}

when there is something be put into the vec, it will allocate heap memory for the ptr. When we do let v = vec![1,2];, the struct Vec itself is still on the stack, only the value of 1,2 are put on the heap. So my code above always put the struct Root on the stack.

    pub fn with_root(blk: impl FnOnce(&Root)) {
        let root = Root::new(); // the root is always on the stack.
        blk(&root);
    }

But I don't think this can be the solution, it it not %100 safe.

kvinwang avatar Jan 22 '19 01:01 kvinwang