rustworkx
rustworkx copied to clipboard
Add `with_capacity` constructors in Python space
This is just a simple exposure of the allocation control from Rust space to Python space.
I was playing around with this for DAG construction in Qiskit and it (surprisingly to me) didn't have a meaningful effect on performance for what I was doing, but even so, it just feels like something I'd like to be able to expose there as well.
Pull Request Test Coverage Report for Build 9619945870
Details
- 42 of 42 (100.0%) changed or added relevant lines in 2 files are covered.
- 7 unchanged lines in 1 file lost coverage.
- Overall coverage decreased (-0.02%) to 95.884%
Files with Coverage Reduction | New Missed Lines | % |
---|---|---|
src/iterators.rs | 7 | 75.59% |
<!-- | Total: | 7 |
Totals | |
---|---|
Change from base Build 9604084112: | -0.02% |
Covered Lines: | 17472 |
Relevant Lines: | 18222 |
💛 - Coveralls
I’ll let @mtreinish give his opinion as well but I am against exposing this to end-users.
Mostly because Python users should not worry about it. For example, Python lists don’t let you pick a capacity. I think we are leaking a detail from Rust into Python
Also, I think for most of the Qiskit DAG code we spent more time on Rust <-> Python conversion than anything else. For some algorithms it takes more time to convert objects than to actually run the core logic
That's fair - it's always going to be the case that the other stuff we're doing is slower, it just seemed unnecessary to me to pay the resize cost even as a papercut when I know how many nodes and edges I'm about to generate to put into my graph.
That's fair - it's always going to be the case that the other stuff we're doing is slower, it just seemed unnecessary to me to pay the resize cost even as a papercut when I know how many nodes and edges I'm about to generate to put into my graph.
I could see a constructor for PyGraph/PyDiGraph which accepts nodes/edges, a little bit similar to what extend_from_edge_list
does now
But if we add with_capacity
then we need to explain to the end user what is a capacity, why it is different from the number of nodes/edges, etc. Which I would rather not do
I have a generator expression (or equivalent opaque iterable) in Python-space (technically it's actually generated in Rust space, but we need to go via DAGCircuit
, so that's irrelevant here) whose length I know ahead of time, but I don't actually want to generate Python-space allocated storage for the collection ahead of time, especially since that would then have copy semantics on entry to extend_from_edge_list
.
Feel free to close the PR if you don't want to deal with explaining capacity to users - I have no proof that it'd have any measurable impact right now anyway.
For me I think we probably want to expose this functionality in some way for performance. Saving on memory allocations can save a bunch of runtime especially for very large graphs. But, I also share Ivan's concerns around exposing too much internal details to the UI. I'm wondering if there is a compromise here on the interface, like instead of doing with_capacity
adding kwargs to the constructor like target_node_count
and target_edge_count
and document the performance benefit and behavior there
I'm happy to retarget this PR to whatever interface you guys would prefer, if it's something you'd consider adding. From my side, my ask is only that I can reserve space for nodes and/or edges before I actually provide them - anything that lets me do that, no matter how wordy / hidden, is fine for me, if you guys agree to add it.
(and I can of course live with it on our side if it's not added at all, for the above reasons)
What about initial_node_count
and initial_edge_count
? That indicates to a Python user that it's not a limit, but a starting point. I do think it's probably worthwhile, especially since we now have a lot of hybrid Python/Rust developers on the qiskit side and probably elsewhere as well.
Or could call the method initial_size
and just use node_count
and edge_count
. Or ...
Updated to use the new forms suggested above.
FWIW, I also like the ability to specify the number of expected nodes/edges when creating rustworkx graphs, so I am in favor of this PR.