Further relax extract_sequence for the upcoming release 0.18
These are the two less controversial parts, i.e. turning anything that can be iterated into a Vec whether it calls itself a sequence or not and also accepting str where the changelog entry gives a concrete hint on how to handle the str-as-Vec<String> edge case ergonomically as is common in Python API.
So I'm still unsure about this. I think the removal of the str special-case handling could be included in 0.17.2 (it was added in 0.17.0). So that could be separated out too if you feel strongly that was incorrect.
The broader relax... is it too much? Passing e.g. a dict will then get a Vec of the keys. Is that helpful? I'm not sure. All opinions on this change would be welcome.
So I'm still unsure about this. I think the removal of the
strspecial-case handling could be included in 0.17.2 (it was added in 0.17.0). So that could be separated out too if you feel strongly that was incorrect.
Personally, my primary motivation to change anything here is making NumPy users happy and that we already did. This follow-up is mainly about furthering the discussion on what our approach to type extraction should be.
I like the general approach of just requiring iterability and dislike the str special case. But I will certainly not throw a tantrum if we do not change either of these.
The broader relax... is it too much? Passing e.g. a
dictwill then get aVecof the keys. Is that helpful? I'm not sure. All opinions on this change would be welcome.
I would also very much like a overview of the opinions of interested users and/or the other maintainers.
I am cc'ing the people already involved in the original issue trying to extract Vec from NumPy arrays: @mtreinish @ritchie46 @aganders3
Passing e.g. a dict will then get a Vec of the keys. Is that helpful? I'm not sure.
I have to admit that I consider that iter(dict) yields an iterator of the keys instead of the key-value pairs to be a historical Python edge case that is similar to the str-can-be-iterated-as-str thing. Getting a Vec of key-value pairs would definitely be useful IMHO but is ruled out the existing Python behaviour.
I haven't really been keeping up with this sequence stuff, but...
Passing e.g. a
dictwill then get aVecof the keys. Is that helpful?
...sounds like a big footgun. That's not what I'd expect when asking pyo3 to convert things into a Vec (rather, I'd expect a typeerror)
rather, I'd expect a typeerror
I think the usefulness of type errors is more nuanced here than for say impl PyTryFrom for PySequence. Constructing a PySequence where later method calls mostly fail because the underlying Python object is not really a sequence is arguably worse than an upfront type error. But in this case, the resulting Vec<T> is perfectly usable as it is a freshly constructed Vec<T> like any other so this reasoning does not apply.
What the type error does yield for FromPyObject is the ability to try another branch of an outer FromPyObject impl which may be a better fit for the underlying Python object. But the order in which these branches are tried is up to the Rust program by e.g. changing the order of enum variants, so one can already try to extract a PyDict before "falling back" to a Vec to handle this case. (Especially since one would have to implement this exactly the same way in pure Python, i.e. first check if the given value is a dict before trying something like list(val) because that works for dictionaries as well.)
EDIT: I did not spell that out above, but I think a top-level type error returned when calling into a Rust extension is not particularly useful because, well, the call didn't work then.
I noticed this is being discussed in #2785 and figured I'd give my 2¢.
This behavior makes sense to me as a mostly Python-brained developer, but I understand the controversy. It feels more ergonomic than having to call list(ob) in Python before passing into a function expecting a Vec (which is my understanding of a possible workaround).
My only nitpick (which I kind of brought up in #2615) is with the naming/locaton as extract_sequence would now be more about whether the object is iterable than if it's a sequence. This is tangential to the more important question of whether this is the behavior we want, though.
I'm still undecided what the right thing to do is. It should be noted that what we do is currently almost the same as pybind11 does, which is a plus for ecosystem consistency in my opinion.
Arguments for:
- Allowing any Python iterable has potential performance advantages by avoiding the intermediate Python collection.
- Rust
Vechas to be created anyway as part of this conversion, so requiring a concrete Python type also seems wasteful.
Arguments against:
- It's a breaking change (would make many PyO3 libraries accept more types suddenly).
- I think it's harder to take this away again later if we decide it was an error to do so. (We saw how much breakage we got just from not accepting
numpyarrays by accident; imagine if we stopped accepting all iterables how many callsites could break.) - Semantically, Rust's
Vecis roughly akin to PythonSequence, so the existing behaviour does seem correct from a philosophical angle.
I wonder if rather than changing the existing behaviour, it's better if we could add a type PyIterable<T> which can then have easy iteration access? Very similar to our existing PyIterator but with type safety.
(I've been wondering for a while about experimenting with a new crate which provides generic variants of existing PyO3 types, e.g. PyList<T>.)
Semantically, Rust's Vec is roughly akin to Python Sequence, so the existing behaviour does seem correct from a philosophical angle.
Personally, I think of FromPyObject::extract as constructing a Rust type based on a Python value rather than establishing type level equivalence between Rust and Python. I think that also fits with the result of that operation usually being independent of the Python value it originated from. Although we admittedly have it perform "double duty" for argument conversion purposes. From this point of view, <Vec as FromPyObject>::extract would be similar to Python's list(iterable) constructor.
I wonder if rather than changing the existing behaviour, it's better if we could add a type PyIterable<T> which can then have easy iteration access? Very similar to our existing PyIterator but with type safety.
I think this is almost orthogonal to the question posed here, i.e. if I want to work with Vec<T> in the end, taking vals: &PyAny and calling let vals = vals.iter()?.map(|val| val.extract()).collect<PyResult<Vec<T>>>()?; would almost have the same ergonomics IMHO.
I'm still undecided what the right thing to do is.
All that being said, I think that we should drop the change if there is this much uncertainty attached to it. I fear that it takes up scarce review bandwidth that is needed elsewhere and the status quo certainly serves us insofar there is no stream of bug reports on this behaviour.
Ok. I am always sorry to see a contribution not be merged, as there is always an underlying motivation as well as time invested.
Although we admittedly have it perform "double duty" for argument conversion purposes. From this point of view,
<Vec as FromPyObject>::extractwould be similar to Python'slist(iterable)constructor.
I think you've hit the nail on the head here, and this perhaps suggests the avenue for future exploration another time. There are different levels of conversion which users can want on function inputs. The current PyO3 semantics are usually to accept the equivalent Python type - which I'll call "narrow" conversion range. The proposal here is more akin to "wide" conversion range. Implicit conversions like those asked about in #2482 also are flavours of "wide" conversions.
I would say pybind11 defaults to "wide" conversions and has a .noconvert() option to reduce to "narrow" mode.
It is easier for PyO3 users to use newtypes and #[pyo3(from_py_with = ...)] to modify conversion behaviour already, so I think there's less urgency to support something like .noconvert(). At the same time, these sorts of proposals do suggest there is a wish for more. We can definitely continue to be open to better designs to offer in the framework.