pybind11 icon indicating copy to clipboard operation
pybind11 copied to clipboard

Rework function signatures in docstrings of overloaded functions

Open AWhetter opened this issue 5 years ago • 48 comments
trafficstars

Description

This pull request changes the docstring generation so that all function signatures are prepended to the docstring of an overloaded function, instead of the generic signature that is prepended currently. It also includes an additional change that allows the section headings in the docstring to be disabled independently of whether or not function signatures are shown. To keep backwards compatibility, calling disable_function_signatures() will still turn off both the prepended signatures and the section headings. Therefore section headings will only be shown if both function signatures and section headings are enabled. To rephrase, section headings cannot be enabled independently of disabling function signatures.

Closes #2619

Suggested changelog entry:

* All function signatures of an overloaded function are now prepended to the function's docstring.
  To support this change, a new ``disable_section_headings()`` function has been added to the ``options`` class to allow
  the section headings in the docstring of an overloaded function to be disabled.
  `#2621 <https://github.com/pybind/pybind11/pull/2621>

AWhetter avatar Oct 26 '20 03:10 AWhetter

Messing around with this myself, it seems Sphinx is expecting a backslash at the end of the line, if the following is still supposed to be part of the overloaded signature (which is quite ugly, in the raw docstring, but...). Apart from that, things seem to work.

YannickJadoul avatar Jan 05 '21 20:01 YannickJadoul

Nice find! I'm not sure how I missed that before. But a small end-to-end test is working as it should for me now.

which is quite ugly, in the raw docstring, but...

You're allowed to have a space before the backslash. I haven't done that here but if you think it looks better with the space there then let me know and I can change it.

AWhetter avatar Jan 16 '21 01:01 AWhetter

@AWhetter, ah, it does allow that space? Didn't know, but that might be slightly less ugly, yes. I think it still ugly, though (and I don't fully understand why it's necessary; Sphinx could just try parsing the next line and check if it's a signature, just like it does for the first line?), but yes, I've also made a similar ad-hoc change to the pybind11 subtree in my project, now, so thanks for pointing me to this! :-)

Is this ready for review? Guess this is only scheduled for 2.7, anyway, but I could go over this after having played around with it myself :-)

YannickJadoul avatar Jan 16 '21 13:01 YannickJadoul

I've just squashed the commits back up, so this is now ready for review.

AWhetter avatar Jan 17 '21 00:01 AWhetter

To rephrase, section headings cannot be enabled independently of disabling function signatures.

~Just a thought: as you say, this PR is making two changes: 1) it is adding the overloaded signatures to be recognized by e.g. Sphinx, and 2) it is removing these "section headings".~

~What about splitting this into two? I would expect that the different sections are still useful if people have different docstrings for different overloads, but that having the overloaded signatures at the start is almost always better (and so we could turn that to on-by-default, maybe)?~

~It's just an idea. I'm not sure I like the option-class to become too fine-grained either, actually, but...~

We might want @wjakob to weight in on this as well, actually.

EDIT: Sorry, scratch all that. This is the case, already, if I look more closely at the implementation. Except that the multiple signatures aren't optional :-)

YannickJadoul avatar Jan 19 '21 23:01 YannickJadoul

Related to your comment in #2619: it's a bit of a pity that CPython's docstrings don't have the trailing \ :-/ Did you check that stubgen handles this new version with \ correctly as well?

YannickJadoul avatar Jan 20 '21 00:01 YannickJadoul

By chance, stubgen works, but only when the signature headings are included. stubgen isn't capable of parsing the backslash separated signatures at the start of the docstring, so it ignores them, but it can parse the section headings. So the original issue I described of stubgen outputting myfunc(*args, **kwargs) ->Any no longer happens.

I still think there's value in making the section headings in a docstring optional, but I'm more worried about resolving the issue of stubgen outputting the *args, **kwargs signature. If we do decide that making the section headings optional is worth it, I've submitted a pull request to stubgen to support that: https://github.com/python/mypy/pull/10019

AWhetter avatar Feb 04 '21 06:02 AWhetter

By chance, stubgen works, but only when the signature headings are included. stubgen isn't capable of parsing the backslash separated signatures at the start of the docstring, so it ignores them, but it can parse the section headings. So the original issue I described of stubgen outputting myfunc(*args, **kwargs) ->Any no longer happens.

Ugh. I had just thrown out the headings :-(

I still think there's value in making the section headings in a docstring optional, but I'm more worried about resolving the issue of stubgen outputting the *args, **kwargs signature.

Agreed. Does it make sense to submit an issue to stubgen or Sphinx (or both, such that they communicate!) to make sure that they act the same? :-/ (i.e., with \ or without \; I don't really care either way, though without \ does look nicer to me)

YannickJadoul avatar Feb 04 '21 12:02 YannickJadoul

Does it make sense to submit an issue to stubgen or Sphinx (or both, such that they communicate!) to make sure that they act the same? :-/ (i.e., with \ or without \; I don't really care either way, though without \ does look nicer to me)

Oh. python/mypy#10019. I should learn how to read. Sorry! (Though asking Sphinx to not require the \ also still could make sense to me, though. But it's going to be harder to implement?)

YannickJadoul avatar Feb 04 '21 12:02 YannickJadoul

@AWhetter Would it be less intrusive to make sphinx and stubgen support pybind11 overloading syntax? FYI: A set of changes to stubgen is on the way: https://github.com/python/mypy/pull/9903

sizmailov avatar Feb 06 '21 14:02 sizmailov

It would be less intrusive for pybind, yes. However I'm trying to make everything agree on a convention that is also used outside of pybind, rather than have everything support pybind's way of doing it.

You're right that we should be looking for a solution that minimises intrusion though, and I think that @YannickJadoul has identified that solution; get Sphinx to support finding signatures without a backslash, stubgen would require no changes, and get pybind to put function signatures at the beginning of the docstring.

AWhetter avatar Feb 06 '21 17:02 AWhetter

Ok, that seems reasonable.

What about signatures that span multiple lines? Presumably pybind11 would always produce exactly one signature per line, but we are talking about 3rd party tools too. How should it be handled? One can argue for a like-python approach: keep parsing new lines until signature is syntactically complete.

Example:

foo(x: int) -> Callable[
              Optional[Tuple[int, int]]
         ]
foo(x: str) -> Callable \ 
         [
              Optional[Tuple[str, str]]
         ]
foo() -> None

Alternatively, multiline signatures can be disallowed/unsupported, but this seems to be unnecessary restriction. What do you think?

sizmailov avatar Feb 06 '21 18:02 sizmailov

For that to work I think we would have to change the signature parsing of Sphinx so that it matches what stubgen does where it tokenises that string and properly parses the signatures. At the moment Sphinx is doing a basic regex search on each line of the docstring. It's definitely possible. The change I'm about to make to Sphinx is quite small, but you could submit an issue to Sphinx to see if they would consider a bigger change to the signature parsing?

AWhetter avatar Feb 06 '21 18:02 AWhetter

The sphinx choice of trailing backslash as indication for multiple signatures become fairly unfortunate one. It might result into a cumbersome signature parsing implementation which supports both "legacy" and "new" versions... but is doable for sure. I may have a too few clock cycles for that in next months, can't promise, sorry :disappointed:

Probably it's not the issue that requires immediate solution, but still it's worth to think about ahead of time.

sizmailov avatar Feb 06 '21 18:02 sizmailov

Good news! Sphinx v4 will support the function signatures separated without backslashes. I've updated this pull request to not include separating backslashes. I've done another small end to end test using an install of Sphinx from its "master" branch and using stubgen from the latest mypy, and both work with the docstrings that pybind outputs from this pull request. If disable_section_headings() is not set, stubgen outputs duplicate type signatures but that's a separate issue.

AWhetter avatar Apr 04 '21 17:04 AWhetter

Hello, just dropping a little message to see if there is still some interest from maintainers into merging this? I'd personally be interested in getting that to use it on a project I contribute to (https://github.com/PixarAnimationStudios/OpenTimelineIO).

Thanks!

This looks useful to me, the change is relatively small, and the added documentation is well done.

Caveat: I don't know much about sphinx and stubgen (don't use it in my own work) and haven't worked directly in or around the changed code in pybind11.h.

My main concern is backward compatibility. Specifically:

instead of the generic signature that is prepended currently.

How much trouble might this cause? — I honestly don't know, but from general experience, I'd bet some tool chains will break.

What's the extra cost for adding another option, to get back the generic signature? (Seems small.)

What are the arguments for opt-in (old behavior is the default) vs opt-out (new behavior is the default)?

Gut feeling: assuming there will only be a fairly small group of affected users, opt-out would seem like a good choice.


@AWhetter is there a chance that you could rebase this PR (as is)? Then I could run it through our (Google) global testing to see if there are any surprises.

rwgk avatar Dec 29 '21 09:12 rwgk

Thanks for updating this PR!

It turns out, this PR breaks one test (out of millions). The breaking test involves this unexpected diff:

@@ -6,4 +6,8 @@
     def __init__(self, arg0: int) -> None: ...
     @overload
     def __init__(self, arg0: Span[str]) -> None: ...
+    @overload
+    def __init__(self, arg0: int) -> None: ...
+    @overload
+    def __init__(self, arg0: Span[str]) -> None: ...
     def set_current_measurement(self, value: Span[float], timestamp, sensor_index: int = ...) -> None: ...

Apparently the docstrings are mined to build .pyi files, in this case picking up the function signatures twice. By patching bool show_section_headings = false; in pybind11/options.h this particular issue is fixed, but there is still another type of failure:

-    def get_current(self, arg0: int) -> float: ...
+    def get_current(self, *args, **kwargs) -> Any: ...

I'd need to dig in to understand this one, but I'm convinced already, we will force users into difficult situations by changing the docstrings abruptly. I feel this change needs to be introduced gradually, e.g. by adding the new feature as opt-in (old behavior still the default) in the next pybind11 release, then flipping the default to the new behavior a few months later. Maybe the code for the old behavior can be removed completely after a year or two.

I overlooked before: It would be much cleaner to add new tests to test_docstring_options, designed to cover all combinations of options and corner cases, and not to change test_factory_constructors (at least until the new behavior is made the default).

rwgk avatar Dec 31 '21 02:12 rwgk

Apparently the docstrings are mined to build .pyi files, in this case picking up the function signatures twice.

Is stubgen being used to do this and does each overload have its own docstring? It currently pulls the signature from both the signature list at the top of the docstring and from each of the section headings, but keeps the duplicates (mentioned here originally: https://github.com/pybind/pybind11/pull/2621#issuecomment-813069464).

I don't know what's going on with get_current without more information so let me know when you find something.

AWhetter avatar Jan 05 '22 18:01 AWhetter

As for disabling this behaviour by default, it will require adding another option and I don't love that this pull request would then increase the number of options from two to four. It'll also make what's already a complex part of pybind code even more difficult to understand. To have to do that to protect such a small percentage of users doesn't seem worth it to me. I'm happy accept whatever you as maintainers think is best but I felt that it was worth raising.

AWhetter avatar Jan 05 '22 19:01 AWhetter

it will require adding another option and I don't love that this pull request would then increase the number of options from two to four.

I feel the same, but based on past experience working with a huge codebase, and mixed use internal vs OSS, the alternative is far worse. That's the price we pay for success: seemingly simple changes are not actually simple anymore. I think we'll break a fair number of people. The Google global testing is a bit like a canary in the coal mine in this respect. Just for me, navigating a pybind11 upgrade for Google with that one broken test suite will cost me more time than the extra time needed to add the option. If some other breakage pops up later, for example in a OSS rollout a month later, rollbacks will reach nightmare level.

It'll also make what's already a complex part of pybind code even more difficult to understand.

I don't think it's not that complex, luckily. At least when compared to 10, or 100, or more (nobody knows) people having to deal with complicated rollbacks and roll forwards.

rwgk avatar Jan 05 '22 21:01 rwgk

I've made the requested changes. The test failures look unrelated to my changes. I did try out using a single new option to switch between the current docstrings, and a docstring with prepended overload signatures and no section heading, but this was harder to explain in the documentation and I can see legitimate reasons for wanting to keep section headings but still prepend the overload signatures to the docstring. Maybe we can consolidate and clean up the options in the next major version.

AWhetter avatar Jan 06 '22 22:01 AWhetter

I'll close and reopen after a few seconds to trigger a CI rerun, to see if that shakes out the pip errors (I saw them in another PR yesterday but then it worked again a few hours ago).

rwgk avatar Jan 07 '22 01:01 rwgk

I glanced back and forth a few times and it looks great to me. (Unfortunately the force pushes make it difficult to re-review. For the future, please just add commits, force-push only after rebase on master. The commits will be squashed when merging.)

I'll run this through the Google global testing again tonight.

rwgk avatar Jan 07 '22 01:01 rwgk

I can separate out my latest changes from the original commits if you like? I wasn't sure if you would squash them or if that was up to me.

AWhetter avatar Jan 07 '22 01:01 AWhetter

No it's fine, no need to spend extra time. I just submitted the PR for the global testing. If there are no hiccups (rare), results tomorrow morning (Pacific).

rwgk avatar Jan 07 '22 01:01 rwgk

Global test results look good! (Sorry I had to spend more time than usual deflaking).

I'm thinking, for completeness it would be ideal to add a 4th new test with just disable_section_headings(). What do you think? (We'll also be sure that the line breaks work out as desired.)

rwgk avatar Jan 07 '22 20:01 rwgk

Yes good idea. I'll get this added.

AWhetter avatar Jan 08 '22 04:01 AWhetter

The force push was for git rebase master.

rwgk avatar Jan 14 '22 19:01 rwgk

Apologies for taking so long to get to this. It's been a busy week. I've added the additional test. The CI fail looks unrelated. Maybe there's a flaky test because I think it's the same error message as last time.

AWhetter avatar Jan 15 '22 00:01 AWhetter