gdal
gdal copied to clipboard
Added Send trait to vector structs
- [x] I agree to follow the project's code of conduct.
- [ ] I added an entry to
CHANGES.md
if knowledge of this change could be valuable to users.
I needed vector structs to be send for my use case, due to use in an async functions. These changes for good for my use case, since I'm wrapping everything in an Arc<Mutex<
, but let me know if this might cause potential issues, or if something else is needed to make these changes safer.
r? @ttencate
Rebasing over master
should fix the CI error.
If merged, this will close #417.
I don't remember this paragraph; maybe it was added after I filed https://github.com/georust/gdal/issues/417.
That's pretty old, see for example the note at the top of https://gdal.org/drivers/raster/vrt.html#multi-threading-issues.
One option is to keep Send but add a stern warning to the docs. I think there are several more places in the crate already where safe Rust can actually trigger UB.
I don't want this. There are indeed some unsound APIs (ahem read_block
), but we should fix them. Exposing this would be pretty horrible.
If everything needs to be made entirely safe, the crate could use the typestate pattern. Every refcounted type would get a generic parameter which indicates whether it's unique, or might be shared. Most GDAL functions would return shared objects, but constructor-like functions return unique ones.
This might work, but I'm not 100% sure yet. It does bring a fair amount of complexity, though.
In the meanwhile, you can usually work around the types being !Send
by using channels. That should work even in async
code.
After about a month of using this, I haven't encountered any issues. I am however not using any structs in multiple threads, only async functions.
@ttencate Is the idea behind a typestate to enforce so that certain structs have to be moved therefore preventing accidental use in multi-thread? I'd we willing to take a stab at it, but would need some help with that idea.
After about a month of using this, I haven't encountered any issues. I am however not using any structs in multiple threads, only async functions.
Sure, it's not going to cause problems in practice, but the whole idea of Rust is that you're not supposed to do bad things like this in safe code, no matter how hard you try. There was a bit of a mess a while ago, with a web framework that pulled this kind of trick (implementing Send
for !Send
types).
Is the idea behind a typestate to enforce so that certain structs have to be moved therefore preventing accidental use in multi-thread?
If I got it right, it's to make sure you go through some hoops when you send them. For example, you have a normal (owned) Dataset
that's !Send
, you turn it into a SendableDataset
which is (and check the reference count at this point), then the receiving thread turns it back into a Dataset
. You can't use SendableDataset
for anything else, consider this case:
- you own a
Dataset
- you convert it to its
Send
flavour, asserting that the reference count is 1 - you get a
RasterBand
from it - your dataset now has a reference count of 2
There's also the option to use a generic argument to keep track of "sendability", but I think it boils down to the same thing.
This makes some code look nicer (you can pass datasets to other threads instead of paths to open), but doesn't help much in async contexts, where you switch threads all the time.
What I had in mind with the typestate pattern was something along the lines of Dataset<Unique>
vs. Dataset<Shared>
, where only Dataset<Unique>
is Send
. However, I like @lnicola's idea much better, because it doesn't require changing literally all the types in the crate to be generic.
How about this then:
- We add a trait
RefCounted
, which has (at least) a functionref_count(&self) -> usize;
. - Reference-counted objects like
Dataset
implement this trait. - We add a wrapper struct
Sender<T>
which wraps aT
and (unsafely, manually) implementsSend
. ASender<T>
can be constructed from anyT: RefCounted
, but the construction fails if the reference count is greater than 1:
pub struct Sender<T>(T);
unsafe impl<T> Send for Sender<T> {}
impl<T: RefCounted> Sender<T> {
// Could also be a TryFrom impl.
pub fn try_wrap(inner: T) -> Result<T> {
if inner.ref_count() <= 1 {
Ok(Self(inner))
} else {
Err(Error::...)
}
}
// Could also be into, but we can't blanket-impl From<Sender<T>> for T due to orphan rules.
pub fn into_inner(self) -> T {
self.0
}
}
However, this is still not fully sound. It can break if T
contains references to other ref-counted objects inside it, if those inner objects don't hold a reference back to the T
. For example, in the scenario I sketched earlier: OGRFeature
has a reference to its OGRFeatureDefn
but presumably not the other way round. So if we send a freshly created, uniquely referenced Feature
to another thread, both threads have a reference to the underlying Defn
.
(In this particular case, the Defn
is probably only read from, and concurrent reads are probably okay, but neither is guaranteed.)
How about this then
Yup, that looks like a nice API.
In this particular case, the Defn is probably only read from, and concurrent reads are probably okay, but neither is guaranteed.
Maybe I'm missing something, but won't destroying the Feature
touch the reference count of the Defn
?
Yes, I suppose that's a data race.
Well, given all this, I don't think making the GDAL objects Send
is a good idea.
Ok, is there a work around for async functions in that case? Just so I don't have to sync this branch to master indefinitely :)
I don't know your app, but you can do something like:
- make an IO thread or thread pool, each with an associated async MPSC channel
- in your request handler, make a new one-shot channel for the response and send it to an IO thread, along with the request data
-
await
on the one-shot channel for the response - in the IO threads, block while waiting for a request, process it when it arrives, then send the response back
https://github.com/georust/gdal/issues/425 is somewhat related, too.
Well, given all this, I don't think making the GDAL objects
Send
is a good idea.
Not safely. But an unsafe
API, where the user is responsible for upholding GDAL's thread safety rules, would still be better than no API at all.
I could get behind an API like your Sender
above, if its docs include a copious amount of warnings.
I don't know your app, but you can do something like:
- make an IO thread or thread pool, each with an associated async MPSC channel
- in your request handler, make a new one-shot channel for the response and send it to an IO thread, along with the request data
await
on the one-shot channel for the response- in the IO threads, block while waiting for a request, process it when it arrives, then send the response back
#425 is somewhat related, too.
Thanks. The app is basically a rabbitmq consumer(deadpool-lapin
) running in a loop waiting for messages, once a message is received it then sends a message to an actor(coerce-rs
), which is where gdal functions are invoked, after gdal finishes that actor further delegates work to other actors (db, s3 etc).
Coerce is working basically how you're describing this to be implemented, only thing is they don't provide synchronous handlers, which is why I had to make this workaround. I'm also thinking about implementing a synchronous handler that can return a Box<Pin<Future<...>>>
and get resolved later, that way I could call gdal functions in the synchronous part of the handler and just pass the data to async part, at which point gdal structs can safely be dropped.
That being said I also like the idea of implementing a Sender
struct to handle unsafe code.
I just ran into this myself for using SpatialRef
and I'm not convinced the following statement is true.
A problem exists even with structs that don't have a lifetime parameter, like
SpatialRef
. It is also internally refcounted, so I canClone
it andSend
the clone to another thread, then call (mutating) functions on both copies concurrently, causing UB.
While the underlying OGRSpatialReference
from GDAL is internally reference counted, the implementation here does not expose or use any explicit reference counting methods. The implementation for Clone
correctly calls OSRClone(...)
, rather than simply using a reference count:
https://github.com/georust/gdal/blob/204e4700f71dcc63f1f4993f2d0a5fe118f92208/src/spatial_ref/srs.rs#L27-L32
Examining the code on GDAL's side we can see when OSRClone(...)
is called they make a proper deep copy as well, and don't just increase the reference count:
https://github.com/OSGeo/gdal/blob/e4026b077a092f614dc8a25b0025f01080eee47d/ogr/ogrspatialreference.cpp#L1376-L1392
Given the above, I believe at least SpatialRef
can be safely marked as Send
in its current implementation. I do think there could be some possibility for unsafe behavior between threads if a user chooses to use to_c_hsrs(&self) -> OGRSpatialReferenceH. I could see marking the method as unsafe
in the event SpatialRef
implements Send
. I haven't reviewed any of the other types in detail at this point, so I cannot speak to them directly, but it's possible some other reference counted types could also be Send
with close inspection.