sphinx Bugfix: Make attribute/"thing" selector much more lax

Previous was expecting a regex "word", corresponding to python variable token rules but with some things like enum.Enum, typing.TypedDict & django.models.TextChoices, passed strings are converted to attributes and so those attributes can have arbitrary naming contents

Fixes #10322

Of note is that I've implemented this by accepting almost anything, but per the bug we could just add a few common breakage characters like - or ;: [\w;-], but this requires more foresight as to what might come up

Mar 04 '23 01:03 mjsir911

Both the ruff issue & the test issues with test_format_date aren't due to my changes.

Mar 06 '23 17:03 mjsir911

Can you fix the conflicts?
How do we document a special key containing ()?
Can we change the pattern so that escaped characters are gobbled in the attribute name? (and let autodoc escape the field name when needed).

Aug 16 '23 16:08 picnixz

How do we document a special key containing ()?

I'm not sure if the current regex can allow ambiguity between a field called () & optional arguments list coming after that field, looking into how to regex this one out.

Can we change the pattern so that escaped characters are gobbled in the attribute name? (and let autodoc escape the field name when needed).

Confused about this one too, could you elaborate?

Aug 19 '23 02:08 mjsir911

I'm not sure if the current regex can allow ambiguity between a field called () & optional arguments list coming after that field, looking into how to regex this one out.

It would be good to test then and include more pathological field namnes.

Confused about this one too, could you elaborate?

Yeah, now that I reread myself it is confusing. What I meant is that:


.. py:attribute:: MyEnum.a\(\)
   
   The ``a()`` member.

would be parsed correctly because of the backslashes, so you won't end up with having just a.

Aug 19 '23 08:08 picnixz

Thank you for the work but I have the alternative #11937. Personally, I am a bit reluctant to allow those characters unless we have a clear idea on how to make it extendable easily without having to always add the special characters manually or exclude some from the regex.

AFAICT, the issue is essentially because of the signature parser and not something else. Would it make sense to actually do two passes instead of one where in the second pass, you actually try to pick the signature not from the string directly but by additionally using dynamic information of the object being documented if available (e.g., you know that it is an enum so you try to generate the signature a bit differently for enums, same idea for TypedDict or constructions that allow for arbitrary characters)?

I can investigate a bit more upon my return but I would like to have a clean solution for this specific issue (which would also solve other issues related to non-word characters).

Mar 07 '24 08:03 picnixz

Kind poke @AA-Turner maybe?

I was hoping to see this land in Sphinx 8.

I'm not sure about all of the code, but we've cherry-picked the regexp change alone into our conf.py to have the TypedDict keys documentable.

Here's our code:

Sphinx 6: https://github.com/canonical/operator/blob/828b542d521930c8e225c313eac6995ad7c6d0ea/docs/custom_conf.py#L26-L38
Sphinx 8: https://github.com/canonical/operator/blob/233aee2e3da109f523c81f4014ded63fe53792db/docs/custom_conf.py#L26-L39 (one line diff wrt. the above)

Example Python code where a fix is needed:

https://github.com/canonical/operator/blob/828b542d521930c8e225c313eac6995ad7c6d0ea/ops/pebble.py#L103-L118

Aug 19 '24 00:08 dimaqq