rustworkx icon indicating copy to clipboard operation
rustworkx copied to clipboard

Add `with_capacity` constructors in Python space

Open jakelishman opened this issue 11 months ago • 8 comments

This is just a simple exposure of the allocation control from Rust space to Python space.

I was playing around with this for DAG construction in Qiskit and it (surprisingly to me) didn't have a meaningful effect on performance for what I was doing, but even so, it just feels like something I'd like to be able to expose there as well.

jakelishman avatar Aug 29 '23 11:08 jakelishman

Pull Request Test Coverage Report for Build 9619945870

Details

  • 42 of 42 (100.0%) changed or added relevant lines in 2 files are covered.
  • 7 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.02%) to 95.884%

Files with Coverage Reduction New Missed Lines %
src/iterators.rs 7 75.59%
<!-- Total: 7
Totals Coverage Status
Change from base Build 9604084112: -0.02%
Covered Lines: 17472
Relevant Lines: 18222

💛 - Coveralls

coveralls avatar Aug 29 '23 11:08 coveralls

I’ll let @mtreinish give his opinion as well but I am against exposing this to end-users.

Mostly because Python users should not worry about it. For example, Python lists don’t let you pick a capacity. I think we are leaking a detail from Rust into Python

Also, I think for most of the Qiskit DAG code we spent more time on Rust <-> Python conversion than anything else. For some algorithms it takes more time to convert objects than to actually run the core logic

IvanIsCoding avatar Aug 29 '23 22:08 IvanIsCoding

That's fair - it's always going to be the case that the other stuff we're doing is slower, it just seemed unnecessary to me to pay the resize cost even as a papercut when I know how many nodes and edges I'm about to generate to put into my graph.

jakelishman avatar Aug 30 '23 10:08 jakelishman

That's fair - it's always going to be the case that the other stuff we're doing is slower, it just seemed unnecessary to me to pay the resize cost even as a papercut when I know how many nodes and edges I'm about to generate to put into my graph.

I could see a constructor for PyGraph/PyDiGraph which accepts nodes/edges, a little bit similar to what extend_from_edge_list does now

But if we add with_capacity then we need to explain to the end user what is a capacity, why it is different from the number of nodes/edges, etc. Which I would rather not do

IvanIsCoding avatar Aug 30 '23 12:08 IvanIsCoding

I have a generator expression (or equivalent opaque iterable) in Python-space (technically it's actually generated in Rust space, but we need to go via DAGCircuit, so that's irrelevant here) whose length I know ahead of time, but I don't actually want to generate Python-space allocated storage for the collection ahead of time, especially since that would then have copy semantics on entry to extend_from_edge_list.

Feel free to close the PR if you don't want to deal with explaining capacity to users - I have no proof that it'd have any measurable impact right now anyway.

jakelishman avatar Aug 30 '23 12:08 jakelishman

For me I think we probably want to expose this functionality in some way for performance. Saving on memory allocations can save a bunch of runtime especially for very large graphs. But, I also share Ivan's concerns around exposing too much internal details to the UI. I'm wondering if there is a compromise here on the interface, like instead of doing with_capacity adding kwargs to the constructor like target_node_count and target_edge_count and document the performance benefit and behavior there

mtreinish avatar Aug 30 '23 12:08 mtreinish

I'm happy to retarget this PR to whatever interface you guys would prefer, if it's something you'd consider adding. From my side, my ask is only that I can reserve space for nodes and/or edges before I actually provide them - anything that lets me do that, no matter how wordy / hidden, is fine for me, if you guys agree to add it.

(and I can of course live with it on our side if it's not added at all, for the above reasons)

jakelishman avatar Aug 30 '23 13:08 jakelishman

What about initial_node_count and initial_edge_count? That indicates to a Python user that it's not a limit, but a starting point. I do think it's probably worthwhile, especially since we now have a lot of hybrid Python/Rust developers on the qiskit side and probably elsewhere as well.

Or could call the method initial_size and just use node_count and edge_count. Or ...

enavarro51 avatar Nov 01 '23 16:11 enavarro51

Updated to use the new forms suggested above.

jakelishman avatar May 15 '24 15:05 jakelishman

FWIW, I also like the ability to specify the number of expected nodes/edges when creating rustworkx graphs, so I am in favor of this PR.

alexanderivrii avatar May 23 '24 10:05 alexanderivrii