gdal icon indicating copy to clipboard operation
gdal copied to clipboard

Added Send trait to vector structs

Open callmeahab opened this issue 1 year ago • 16 comments

  • [x] I agree to follow the project's code of conduct.
  • [ ] I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

I needed vector structs to be send for my use case, due to use in an async functions. These changes for good for my use case, since I'm wrapping everything in an Arc<Mutex<, but let me know if this might cause potential issues, or if something else is needed to make these changes safer.

callmeahab avatar Jul 05 '23 07:07 callmeahab

r? @ttencate

lnicola avatar Jul 29 '23 16:07 lnicola

Rebasing over master should fix the CI error.

lnicola avatar Jul 29 '23 18:07 lnicola

If merged, this will close #417.

ttencate avatar Jul 29 '23 18:07 ttencate

I don't remember this paragraph; maybe it was added after I filed https://github.com/georust/gdal/issues/417.

That's pretty old, see for example the note at the top of https://gdal.org/drivers/raster/vrt.html#multi-threading-issues.

One option is to keep Send but add a stern warning to the docs. I think there are several more places in the crate already where safe Rust can actually trigger UB.

I don't want this. There are indeed some unsound APIs (ahem read_block), but we should fix them. Exposing this would be pretty horrible.

If everything needs to be made entirely safe, the crate could use the typestate pattern. Every refcounted type would get a generic parameter which indicates whether it's unique, or might be shared. Most GDAL functions would return shared objects, but constructor-like functions return unique ones.

This might work, but I'm not 100% sure yet. It does bring a fair amount of complexity, though.

In the meanwhile, you can usually work around the types being !Send by using channels. That should work even in async code.

lnicola avatar Aug 01 '23 18:08 lnicola

After about a month of using this, I haven't encountered any issues. I am however not using any structs in multiple threads, only async functions.

@ttencate Is the idea behind a typestate to enforce so that certain structs have to be moved therefore preventing accidental use in multi-thread? I'd we willing to take a stab at it, but would need some help with that idea.

callmeahab avatar Aug 01 '23 20:08 callmeahab

After about a month of using this, I haven't encountered any issues. I am however not using any structs in multiple threads, only async functions.

Sure, it's not going to cause problems in practice, but the whole idea of Rust is that you're not supposed to do bad things like this in safe code, no matter how hard you try. There was a bit of a mess a while ago, with a web framework that pulled this kind of trick (implementing Send for !Send types).

Is the idea behind a typestate to enforce so that certain structs have to be moved therefore preventing accidental use in multi-thread?

If I got it right, it's to make sure you go through some hoops when you send them. For example, you have a normal (owned) Dataset that's !Send, you turn it into a SendableDataset which is (and check the reference count at this point), then the receiving thread turns it back into a Dataset. You can't use SendableDataset for anything else, consider this case:

  • you own a Dataset
  • you convert it to its Send flavour, asserting that the reference count is 1
  • you get a RasterBand from it
  • your dataset now has a reference count of 2

There's also the option to use a generic argument to keep track of "sendability", but I think it boils down to the same thing.

This makes some code look nicer (you can pass datasets to other threads instead of paths to open), but doesn't help much in async contexts, where you switch threads all the time.

lnicola avatar Aug 02 '23 05:08 lnicola

What I had in mind with the typestate pattern was something along the lines of Dataset<Unique> vs. Dataset<Shared>, where only Dataset<Unique> is Send. However, I like @lnicola's idea much better, because it doesn't require changing literally all the types in the crate to be generic.

How about this then:

  • We add a trait RefCounted, which has (at least) a function ref_count(&self) -> usize;.
  • Reference-counted objects like Dataset implement this trait.
  • We add a wrapper struct Sender<T> which wraps a T and (unsafely, manually) implements Send. A Sender<T> can be constructed from any T: RefCounted, but the construction fails if the reference count is greater than 1:
pub struct Sender<T>(T);

unsafe impl<T> Send for Sender<T> {}

impl<T: RefCounted> Sender<T> {
    // Could also be a TryFrom impl.
    pub fn try_wrap(inner: T) -> Result<T> {
        if inner.ref_count() <= 1 {
            Ok(Self(inner))
        } else {
            Err(Error::...)
        }
    }

    // Could also be into, but we can't blanket-impl From<Sender<T>> for T due to orphan rules.
    pub fn into_inner(self) -> T {
        self.0
    }
}

However, this is still not fully sound. It can break if T contains references to other ref-counted objects inside it, if those inner objects don't hold a reference back to the T. For example, in the scenario I sketched earlier: OGRFeature has a reference to its OGRFeatureDefn but presumably not the other way round. So if we send a freshly created, uniquely referenced Feature to another thread, both threads have a reference to the underlying Defn.

(In this particular case, the Defn is probably only read from, and concurrent reads are probably okay, but neither is guaranteed.)

ttencate avatar Aug 02 '23 10:08 ttencate

How about this then

Yup, that looks like a nice API.

In this particular case, the Defn is probably only read from, and concurrent reads are probably okay, but neither is guaranteed.

Maybe I'm missing something, but won't destroying the Feature touch the reference count of the Defn?

lnicola avatar Aug 02 '23 11:08 lnicola

Yes, I suppose that's a data race.

ttencate avatar Aug 02 '23 12:08 ttencate

Well, given all this, I don't think making the GDAL objects Send is a good idea.

lnicola avatar Aug 02 '23 12:08 lnicola

Ok, is there a work around for async functions in that case? Just so I don't have to sync this branch to master indefinitely :)

callmeahab avatar Aug 02 '23 13:08 callmeahab

I don't know your app, but you can do something like:

  • make an IO thread or thread pool, each with an associated async MPSC channel
  • in your request handler, make a new one-shot channel for the response and send it to an IO thread, along with the request data
  • await on the one-shot channel for the response
  • in the IO threads, block while waiting for a request, process it when it arrives, then send the response back

https://github.com/georust/gdal/issues/425 is somewhat related, too.

lnicola avatar Aug 02 '23 13:08 lnicola

Well, given all this, I don't think making the GDAL objects Send is a good idea.

Not safely. But an unsafe API, where the user is responsible for upholding GDAL's thread safety rules, would still be better than no API at all.

ttencate avatar Aug 02 '23 13:08 ttencate

I could get behind an API like your Sender above, if its docs include a copious amount of warnings.

lnicola avatar Aug 02 '23 13:08 lnicola

I don't know your app, but you can do something like:

  • make an IO thread or thread pool, each with an associated async MPSC channel
  • in your request handler, make a new one-shot channel for the response and send it to an IO thread, along with the request data
  • await on the one-shot channel for the response
  • in the IO threads, block while waiting for a request, process it when it arrives, then send the response back

#425 is somewhat related, too.

Thanks. The app is basically a rabbitmq consumer(deadpool-lapin ) running in a loop waiting for messages, once a message is received it then sends a message to an actor(coerce-rs), which is where gdal functions are invoked, after gdal finishes that actor further delegates work to other actors (db, s3 etc).

Coerce is working basically how you're describing this to be implemented, only thing is they don't provide synchronous handlers, which is why I had to make this workaround. I'm also thinking about implementing a synchronous handler that can return a Box<Pin<Future<...>>> and get resolved later, that way I could call gdal functions in the synchronous part of the handler and just pass the data to async part, at which point gdal structs can safely be dropped.

That being said I also like the idea of implementing a Sender struct to handle unsafe code.

callmeahab avatar Aug 02 '23 20:08 callmeahab

I just ran into this myself for using SpatialRef and I'm not convinced the following statement is true.

A problem exists even with structs that don't have a lifetime parameter, like SpatialRef. It is also internally refcounted, so I can Clone it and Send the clone to another thread, then call (mutating) functions on both copies concurrently, causing UB.

While the underlying OGRSpatialReference from GDAL is internally reference counted, the implementation here does not expose or use any explicit reference counting methods. The implementation for Clone correctly calls OSRClone(...), rather than simply using a reference count: https://github.com/georust/gdal/blob/204e4700f71dcc63f1f4993f2d0a5fe118f92208/src/spatial_ref/srs.rs#L27-L32

Examining the code on GDAL's side we can see when OSRClone(...) is called they make a proper deep copy as well, and don't just increase the reference count: https://github.com/OSGeo/gdal/blob/e4026b077a092f614dc8a25b0025f01080eee47d/ogr/ogrspatialreference.cpp#L1376-L1392

Given the above, I believe at least SpatialRef can be safely marked as Send in its current implementation. I do think there could be some possibility for unsafe behavior between threads if a user chooses to use to_c_hsrs(&self) -> OGRSpatialReferenceH. I could see marking the method as unsafe in the event SpatialRef implements Send. I haven't reviewed any of the other types in detail at this point, so I cannot speak to them directly, but it's possible some other reference counted types could also be Send with close inspection.

BlakeOrth avatar Nov 01 '23 23:11 BlakeOrth