sphinx
sphinx copied to clipboard
Intersphinx delegation to domains
Feature or Bugfix
- Feature
- Bugfix
- Refactoring
Purpose
Currently, Intersphinx is performing lookups in inventories purely by string comparison. Unfortunately this does not work in general. For C and especially C++ more complex processing needs to be done.
As an overall goal for Intersphinx: provide cross-referencing to external projects as if those projects were merely a part of the current project. In order to do this I believe Intersphinx must delegate most processing of inventories and lookup to the domains. This means both a new inventory format is needed and changes to lookup. This PR tries to address the latter. See the bottom for notes on the inventory format.
Detail
- When loading inventories, group entries by domain and object type, and package data only related to Intersphinx in a new class
sphinx.util.inventory.InventoryItemSet. This class is private for Intersphinx. - Intersphinx maintains domain storage of arbitrary type, initialized by a clone of a new domain variable
initial_intersphinx_inventory. This mirrors the ordinary domain storage throughinitial_data. - Data is then given to the domains through a new method
intersphinx_add_entries_v2(), where the domain stores in the given domain storage. Thev2in the name refers to the current version ofobjects.inv(the v1 is converted to v2). - When Intersphinx gets the
missing-referenceevent it will use two new domain methodsintersphinx_resolve_xref. The previously filled domain storage is given here as well. - The domain must perform lookup and will end up with a (possibly empty)
sphinx.util.inventory.InventoryItemSetwhich must be returned (the domains doesn't know that it is this class, just some arbitrary data that it got from Intersphinx earlier). Intersphinx will then perform disambiguation between duplicates across the inventories, and create the resulting nodes.
The old lookup code in intersphinx.py is now the default implementation in the Domain base class for backwards compatibility. The domain-specific things for std and py have been spliced out (though in a slightly haxy manner).
As a kind of proof-of-concept I have implemented custom handling in the C domain, where it reuses the ordinary symbol tables classes and lookup procedures. A test for lookup in relation to anonymous types illustrates that more lookups will work with this implementation.
Relates
- #7819 will completely break intersphinx lookups without this PR.
Next steps
We should make a new inventory format where domains can store symbols in whichever format they like: https://github.com/orgs/sphinx-doc/discussions/12204
Rebased to resolve conflicts. @tk0miya, do you by chance have time to review this before we get too close to the v4 release?
Oh, sorry. I overlooked your PR at all.
I completely agree with you. It's difficult to resolve cross-references only on the intersphinx module and common inventory data structure. So the resolution process should be passed to the domains.
In my thought, it is better to export the whole of the env object (or the whole of the domain object) as an inventory database and import it as a real env (or domain) object on intersphinx. I think it's similar to this PR, but no new methods are needed. What do you think?
# image code
inventory_v3 = fetch_remote_inventory(url)
env = BuildEnvironment.from_inventory(inventory_v3)
cpp_domain = env.get_domain('cpp')
cpp_domain.resolve_xref(pending_xref)
That would be delightfully simply, though I think we would still need domain-specific methods for the loading. The inventory may be generated by an old version, so BuildEnvironment.from_inventory() would need to delegate to the domains so they can do the migration of the data into the current environment. And we would of course need to support the current inventory formats as well.
As for lookup: the current intersphinx lookup does some extra tricks, like splitting of an inventory name and perform targeted lookup, or when multiple inventories have the same entry, then disambiguate it in a deterministic way.
I know the C and C++ domains, and I assume the rest, does not directly support this kind of lookup. So in this way each inventory must be kept separate and intersphinx must in turn perform lookup. In a sense this is an extension of what is already going on: try projects in some predefined order, with the current project being the first.
Though, I'm a bit unsure if this is a good idea, as best candidate lookup will not be possible. In C++ a target T<int> may be resolved to a class template specialization template<> class T<int>, but if no such specialization has been defined, it will resolve to the primary template template<typename T> class T (if it exists). If the inventories are not merged, with built-in disambiguation, this will not work in general as the first inventory might have the primary template, while a latter the specialization. However, this is probably a quite niche use case.
The PR is now updated on top of 5.x where the intersphinx role has been added. I believe it is ready for another round of review.
@jakobandersen The implementation looks good, I have a few comments:
- I think it makes more sense to use a dict as the backing structure in
InventoryItemSet - I don't see the point of the
_v2suffixes -- they are confusing and when we move to a better format forobjects.inv, we would disambiguate by string-based versus structured, I imagine. - I think a few of the new functions (e.g.
process_disabled_reftypes) and properties could do with a docstring to explain what they're for - Some style / typing points (e.g. using
snake_case) and improving static typing - In general I think we should strongly avoid
assertstatements, we should either raise an error or remove the assert.
I've implemented most of my suggested changes in https://github.com/AA-Turner/sphinx/commits/intersphinx_delegation
I would like to merge this sooner rather than later if possible -- please let me know if you're happy with my patches on your branch. Adding docstrings/etc can wait until after this is merged if need be.
A
I want to release 5.2.0, so I'm moving this to 5.3.0.
@jakobandersen please let me know if you've time to review, otherwise I can move forward to allow progress on the other changes.
A
@AA-Turner, please go ahead with a release without this PR. I won't have time to review the next couple of weeks (either).
I've resolved the merge conflicts here.
A
I've also updated my branch with the suggested changes above (https://github.com/AA-Turner/sphinx/commits/intersphinx_delegation)
A
The PR has now been resurrected and should be ready for review and merge.
As we are now discussing a new intersphinx format I have reintroduced the v2 suffixes, as there the new format will require a v3 version of those entities.
Since I'm busy until Sunday, I will not review that before next week. It's also better for me to have a fresh head and fresh view on that. I believe @chrisjsewell would also want to have a look at it.