pybind11 Make stl.h `list|set|map_caster` more user friendly.

Description

(The equivalent of this PR was merged as https://github.com/google/pywrapcc/pull/30045 on July 17, 2023.)

With this PR, it is no longer necessary to explicitly convert Python iterables to tuple(), set(), or map() in several common situations, for example:

-    fn(tuple({"x": 8, "y": 11}.values()))
+    fn({"x": 8, "y": 11}.values())

-    fn(set({5: None, 12: None}.keys()))
+    fn({5: None, 12: None}.keys())

-    fn(dict(PyMappingWithItems().items()))
+    fn(PyMappingWithItems())

See https://github.com/pybind/pybind11/pull/4686#issuecomment-1636838113 below for a more complete demonstration of the behavior differences. (Note that the diff in that comment is reversed, what is + here is - there and vice versa.)

This PR strikes a careful compromise between being user friendly and being safe. This compromise was informed by 10s of thousands of use cases in the wild (Google codebase, which includes a very large number of open source third-party packages). Concretely, in connection with PyCLIF-pybind11 integration work, the behaviors of PyCLIF (http://github.com/google/clif) and pybind11 needed to be converged. Originally PyCLIF was extremely far on the permissive side of the spectrum, by accepting any Python iterable as input for a C++ vector/set/map argument, as long as the elements were convertible. The obvious (in hindsight) problem was that any empty Python iterable could be passed to any of these C++ types, e.g. {} was accepted for C++ vector/set arguments, or [] for C++ map arguments. To fix this, the code base was cleaned up, and gating functions were added to enforce the desired behavior:

https://github.com/google/clif/blob/ae4eee1de07cdf115c0c9bf9fec9ff28efce6f6c/clif/python/runtime.cc#L388-L438

The exact same gating functions are adopted here. Without this behavior change in pybind11, it would be necessary to insert a very large number of explicit tuple(), set(), map() conversions in the Google codebase, to complete the PyCLIF-pybind11 integration work. While it is very easy to tell new users to pass e.g. tuple(iterable) instead of iterable, it is not easy to convince hundreds of happy existing PyCLIF users to accept retroactively inserting the explicit conversions. The obvious question that will come back: Why do we have to clutter our code like this? There is no good reason. The exact implementation of the gating functions may seem a little arbitrary at first sight, but is in fact informed by what amounts to a large-scale field experiment, with the originally very permissive behavior of PyCLIF.

Suggested changelog entry:

stl.h `list|set|map_caster` were made more user friendly: it is no longer necessary to explicitly convert Python iterables to `tuple()`, `set()`, or `map()` in several common situations.

May 30 '23 20:05 rwgk

Close-reopen to trigger GHA, after adding python dev label.

Jun 10 '23 04:06 rwgk

Original PR title and description (outdated)

Title: WIP: Change list_caster (stl.h) to also accept generator objects.

Description:

WIP — It is still undecided if this is the best compromise.

An easy way to understand the pybind11 behavior change is to look at the small test change under commit 7d30378bdffa54dc05f8166adf8dcf045a57d526,

     fn = m.pass_std_vector_int
-    with pytest.raises(TypeError):
-        fn(i for i in range(3))
+    assert fn(i for i in range(3)) == 3

What does this help?

This PR resolves a very large number of global test failures in connection with the PyCLIF-pybind11 integration work. The exact number is still to be determined, but estimated to exceed 200k. The number of root causes is difficult to estimate, order of magnitude 100 probably.

It is very easy to tell new users to pass tuple(generator_object) instead of generator_object to wrapped functions with std::vector or std::array arguments.
It is not easy to convince hundreds of happy existing PyCLIF users to retroactively insert tuple().

The large number of existing use cases is also a signal that the feature is definitely useful in aggregate.

For full context:

This PR is a much more conservative than google/pywrapcc#30042 (rolled back), and safer, because generator objects are guaranteed to be fully consumed, behavior-equivalent to passing tuple(generator_object).
There was exactly one global test failure for google/pywrapcc#30042:
- https://github.com/pytorch/pytorch/blob/3263bd24bee43b4e2c263c0076a2136de6ead947/test/test_native_functions.py#L112

The same test passes with this PR.

2023-07-15 09:55 AM PDT: Global testing with commit 6b5c780ace53eb78deb1273043610be59bb571b9 passed (Google-internal test id OCL:538028927:BASE:548369355:1689440116187:326f581c)
2023-06-07 10:53 AM PDT: Global testing with commit b4f767bfb2de96d52c52ce1a6a406299a044b529 passed (Google-internal test id OCL:538028927:BASE:538529638:1686159334502:31d711fe).
2023-05-31 6:39 AM Global testing with commit 7d30378bdffa54dc05f8166adf8dcf045a57d526 passed (Google-internal test id OCL:536396745:BASE:536692592:1685539068476:4a666603).

Jul 15 '23 17:07 rwgk

Demonstration of behavior changes

New tests in test_stl.py adjusted to work with current master (@ 0620d716382a5619124d8f9d7a6187825a70f018).

diff --git a/tests/test_stl.py b/tests/test_stl.py
index 43615843..4bc4ce6e 100644
--- a/tests/test_stl.py
+++ b/tests/test_stl.py
@@ -386,11 +386,11 @@ def test_return_vector_bool_raw_ptr():
 )
 def test_pass_std_vector_int(fn, offset):
     assert fn([7, 13]) == 140 + offset
-    assert fn({6, 2}) == 116 + offset
-    assert fn({"x": 8, "y": 11}.values()) == 138 + offset
-    assert fn({3: None, 9: None}.keys()) == 124 + offset
-    assert fn(i for i in [4, 17]) == 142 + offset
-    assert fn(map(lambda i: i * 3, [8, 7])) == 190 + offset  # noqa: C417
+    assert fn(tuple({6, 2})) == 116 + offset
+    assert fn(tuple({"x": 8, "y": 11}.values())) == 138 + offset
+    assert fn(tuple({3: None, 9: None}.keys())) == 124 + offset
+    assert fn(tuple(i for i in [4, 17])) == 142 + offset
+    assert fn(tuple(map(lambda i: i * 3, [8, 7]))) == 190 + offset  # noqa: C417
     with pytest.raises(TypeError):
         fn({"x": 0, "y": 1})
     with pytest.raises(TypeError):
@@ -399,8 +399,8 @@ def test_pass_std_vector_int(fn, offset):

 def test_pass_std_vector_pair_int():
     fn = m.pass_std_vector_pair_int
-    assert fn({1: 2, 3: 4}.items()) == 406
-    assert fn(zip([5, 17], [13, 9])) == 2222
+    assert fn(tuple({1: 2, 3: 4}.items())) == 406
+    assert fn(tuple(zip([5, 17], [13, 9]))) == 2222


 def test_list_caster_fully_consumes_generator_object():
@@ -416,7 +416,7 @@ def test_list_caster_fully_consumes_generator_object():
 def test_pass_std_set_int():
     fn = m.pass_std_set_int
     assert fn({3, 15}) == 254
-    assert fn({5: None, 12: None}.keys()) == 251
+    assert fn(set({5: None, 12: None}.keys())) == 251
     with pytest.raises(TypeError):
         fn([])
     with pytest.raises(TypeError):
@@ -466,7 +466,7 @@ class FakePyMappingGenObj(FakePyMappingMissingItems):
 def test_pass_std_map_int():
     fn = m.pass_std_map_int
     assert fn({1: 2, 3: 4}) == 4506
-    assert fn(FakePyMappingWithItems()) == 3507
+    assert fn(dict(FakePyMappingWithItems().items())) == 3507
     with pytest.raises(TypeError):
         fn(FakePyMappingMissingItems())
     with pytest.raises(TypeError):

Jul 15 '23 17:07 rwgk

For easy reference: The equivalent of this PR was reviewed and merged under https://github.com/google/pywrapcc/pull/30045.

If the review here leads to more changes, I'll port them to pywrapcc after merging here.

Jul 17 '23 22:07 rwgk

Thanks for the feedback @henryiii!

To explain briefly: I was completely focused on making pybind11 more permissive "just enough" to avoid large-scale changes around the Google codebase, changes that would probably be viewed as annoying regressions by users.

I agree aligning with collections.abc is fundamentally better, but that didn't even cross my mind before. I'll work on that asap.

Nov 15 '23 20:11 rwgk

@nevedaren (new owner of Python/C++ bindings at Google) FYI

Hi @henryiii, @Skylion007, @EthanSteinberg, this is the second of two PRs that I could not easily remove from the Google codebase (the other one was PR #4597).

I'm coming back to this PR at this time because I'm leaving Google soon. I'm working intensely on making the hand-over as smooth as possible, and leaving pybind11 at Google in an easily maintainable state.

A few days ago (Aug 7), I already switched Google back from the github.com/google fork to the new (#5257) smart_holder branch, but I had to carry this PR as a patch. Merging it now will remove that last patch. Please let me know if you think this could cause a problem. — I believe problems are very unlikely, because this PR has been in production use at Google for over a year, and it was reviewed by Google engineers before it was deployed.

In addition to what I wrote in the PR description above, in the meantime new client code - manually written pybind11 bindings - became dependent on this PR. It wouldn't be too difficult to mail out workarounds (explicitly inserting list(...) etc.) and then remove the patch, but we can only get what is unit tested. It's quite plausible that production code passes e.g. a generator while unit tests pass only lists, tuples, maps. There could be production breakages even if we do all our due diligence.

I don't really want do extra work and take that risk especially because it seems that the feature change is actually considered useful:

Looking at the comments here again (in particular this), I believe the feedback implies that eliminating the need for explicit list(...) etc. conversions is considered useful, because @henry suggested it could be generalized. That's definitely true, but:

From the (unintentional) "field experiment" explained in the PR description it's very clear that generalizations would see very little practical use. — Back in June/July 2023 there were only ~10 fixes that I needed to make around the gigantic Google codebase, to keep the scope of this PR as lean and runtime-efficient as you see it here.
If we decide later that it's worth the additional code complexity anyway, this PR is a clean and meaningful starting point for adding in generalizations. In no way does it make generalizations more difficult.

Aug 10 '24 20:08 rwgk

Thanks Henry!

Aug 13 '24 18:08 rwgk

pybind11 pybind11 copied to clipboard

Make stl.h `list|set|map_caster` more user friendly.

Description

Suggested changelog entry:

Original PR title and description (outdated)

Demonstration of behavior changes

pybind11
pybind11 copied to clipboard