rustworkx
rustworkx copied to clipboard
Serialize and deserialize graphs change order of the edges
What is the expected enhancement?
The current serialization mechanism of PyDiGraph doesn't keep the original order of edges in the graph. This creates an issue for libraries that store the edge index inside the edge weight object (why need to store the edge index inside the edge weight object? it is easier to work with for graphs in which nodes/edges have lots of properties such as Wikidata). A graph is serialized and deserialized when using it in a multiprocess environment. After deserializing, the order of the edges changes and so edge indices stored inside edge weights do not match the new indices, thus messing up the graph.
I took a look at the serialization code here: https://github.com/Qiskit/retworkx/blob/a0aaa1f93ef21fe77dccee63cd37fdb8f43c7cee/src/digraph.rs#L283-L291 and I think we can maintain the original order by dumping the edges in a separated loop. This won't increase the runtime and maybe faster.
If you are okay with my propose, I am happy to create a PR for this if you don't have time. Thanks.
Yeah, this is a good catch. We should be reconstructing the edge indices exactly via __getstate__ and __setstate__ right now it doesn't do that. I think probe the only catch will be for graphs with edge removals we'll have to reconstruct the holes in the indices caused by the removal. We do that for nodes already so we can adapt that code to work with edges too.
If you're willing to open a pr to fix this please go ahead, that would be great! We can target for get this out in a 0.11.1 release.
Ok, I submitted a PR that deals with the hole issue as you mentioned.
petgraph also has an implementation for serializing its internal state, but since PyObject doesn't implement Serialize trait, I can't use petgraph serialization. They has a function into_serializable that can be used as a work around for the PyObject issue and I think it will be faster. But unfortunately, their IntoSerializable trait is private.