Implement Send for Document
Document is declared as follows:
pub(crate) struct _Document {
/// pointer to a libxml document
pub(crate) doc_ptr: xmlDocPtr,
/// hashed pointer-to-Node bookkeeping table
nodes: HashMap<xmlNodePtr, Node>,
}
Now, looking into libxml sources, neither struct _xmlDoc nor struct _xmlNode contain any thread local data. Therefore, it should be possible for the Document to have Send trait.
You're right for a single document object in isolation, but I worry.
Consider a program that builds a document, then extracts some nodes from it for manipulation, then hands off the document to another thread (via Send) while keeping the nodes in the parent thread. It is very easy to cause a race condition - mutating a Node in the parent thread and the same node in the child thread. For Document to be Send I would expect all of its pieces to be sent along with it (or ensure they are safe to live in all threads, via Sync).
I had tried my hand at a Sync interface back in #46 but didn't get too far. But I'm open to ideas.
If you don't need mutability, I could imagine extending the read-only treatment for nodes, which added a thread-safe RoNode struct mirroring Node also for document. So a new RoDocument struct, which is thread-safe, but disallows mutating anything inside the document (so not too useful beyond getting the root node as a RoNode and passing it on - which you can do already).
. It is very easy to cause a race condition - mutating a Node in the parent thread and the same node in the child thread.
Correct. And that's exactly the distinction between Sync and Send. My use case is a bit different: I want to have a "cache" of parsed (immutable) XML documents (basically a HashMap<Document>) and to allow multiple threads run XPATH queries I need Sync (solvable by adding a mutex), and Send. Right now, the cache stores str and each thread parses it again and again.
For
Documentto beSendI would expect all of its pieces to be sent along with it (or ensure they are safe to live in all threads, viaSync).
Agreed. But Sync can be supplied by lib users via Mutex.
I had tried my hand at a
Syncinterface back in #46 but didn't get too far. But I'm open to ideas.
Agreed. Native Sync is definitely out of question.
If you don't need mutability, I could imagine extending the read-only treatment for nodes, which added a thread-safe
RoNodestruct mirroringNodealso for document. So a newRoDocumentstruct, which is thread-safe, but disallows mutating anything inside the document (so not too useful beyond getting the root node as aRoNodeand passing it on - which you can do already).
My use case is basically evaluating XPATHs, so no mutation. And yeah, I think RoDocument would do the trick.
It is very easy to cause a race condition - mutating a Node in the parent thread and the same node in the child thread.
Correct. And that's exactly the distinction between Sync and Send
Actually no, I don't think that is the difference. A Send object must be safely transferable to a thread (where no mutability remains possible in the parent thread), a Sync object must have a safe mutability interface that can be used cross-thread. In our libxml2 wrapper's case, neither interface is possible to guarantee if any mutability is allowed at all.
For Document to be safe to Send, the Rust philosophy is to only be able to send it over if no mutability is even theoretically possible on the parent thread.
So yes, RoDocument is the only way this would be possible with the current wrapper design. Until a PR for that lands, you are welcome to grab the root node as an RoNode and then Send that around as needed.
let root: RoNode = doc.get_root_readonly().unwrap();