sphinx icon indicating copy to clipboard operation
sphinx copied to clipboard

Intersphinx delegation to domains

Open jakobandersen opened this issue 4 years ago • 11 comments

Feature or Bugfix

  • Feature
  • Bugfix
  • Refactoring

Purpose

Currently, Intersphinx is performing lookups in inventories purely by string comparison. Unfortunately this does not work in general. For C and especially C++ more complex processing needs to be done.

As an overall goal for Intersphinx: provide cross-referencing to external projects as if those projects were merely a part of the current project. In order to do this I believe Intersphinx must delegate most processing of inventories and lookup to the domains. This means both a new inventory format is needed and changes to lookup. This PR tries to address the latter. See the bottom for notes on the inventory format.

Detail

  • When loading inventories, group entries by domain and object type, and package data only related to Intersphinx in a new class sphinx.util.inventory.InventoryItemSet. This class is private for Intersphinx.
  • Intersphinx maintains domain storage of arbitrary type, initialized by a clone of a new domain variable initial_intersphinx_inventory. This mirrors the ordinary domain storage through initial_data.
  • Data is then given to the domains through a new method intersphinx_add_entries_v2(), where the domain stores in the given domain storage. The v2 in the name refers to the current version of objects.inv (the v1 is converted to v2).
  • When Intersphinx gets the missing-reference event it will use two new domain methods intersphinx_resolve_xref. The previously filled domain storage is given here as well.
  • The domain must perform lookup and will end up with a (possibly empty) sphinx.util.inventory.InventoryItemSet which must be returned (the domains doesn't know that it is this class, just some arbitrary data that it got from Intersphinx earlier). Intersphinx will then perform disambiguation between duplicates across the inventories, and create the resulting nodes.

The old lookup code in intersphinx.py is now the default implementation in the Domain base class for backwards compatibility. The domain-specific things for std and py have been spliced out (though in a slightly haxy manner).

As a kind of proof-of-concept I have implemented custom handling in the C domain, where it reuses the ordinary symbol tables classes and lookup procedures. A test for lookup in relation to anonymous types illustrates that more lookups will work with this implementation.

Relates

  • #7819 will completely break intersphinx lookups without this PR.

Next steps

We should make a new inventory format where domains can store symbols in whichever format they like: https://github.com/orgs/sphinx-doc/discussions/12204

jakobandersen avatar Feb 24 '21 16:02 jakobandersen

Rebased to resolve conflicts. @tk0miya, do you by chance have time to review this before we get too close to the v4 release?

jakobandersen avatar Mar 11 '21 17:03 jakobandersen

Oh, sorry. I overlooked your PR at all.

I completely agree with you. It's difficult to resolve cross-references only on the intersphinx module and common inventory data structure. So the resolution process should be passed to the domains.

In my thought, it is better to export the whole of the env object (or the whole of the domain object) as an inventory database and import it as a real env (or domain) object on intersphinx. I think it's similar to this PR, but no new methods are needed. What do you think?

# image code
inventory_v3 = fetch_remote_inventory(url)
env = BuildEnvironment.from_inventory(inventory_v3)
cpp_domain = env.get_domain('cpp')
cpp_domain.resolve_xref(pending_xref)

tk0miya avatar Mar 13 '21 14:03 tk0miya

That would be delightfully simply, though I think we would still need domain-specific methods for the loading. The inventory may be generated by an old version, so BuildEnvironment.from_inventory() would need to delegate to the domains so they can do the migration of the data into the current environment. And we would of course need to support the current inventory formats as well.

As for lookup: the current intersphinx lookup does some extra tricks, like splitting of an inventory name and perform targeted lookup, or when multiple inventories have the same entry, then disambiguate it in a deterministic way. I know the C and C++ domains, and I assume the rest, does not directly support this kind of lookup. So in this way each inventory must be kept separate and intersphinx must in turn perform lookup. In a sense this is an extension of what is already going on: try projects in some predefined order, with the current project being the first. Though, I'm a bit unsure if this is a good idea, as best candidate lookup will not be possible. In C++ a target T<int> may be resolved to a class template specialization template<> class T<int>, but if no such specialization has been defined, it will resolve to the primary template template<typename T> class T (if it exists). If the inventories are not merged, with built-in disambiguation, this will not work in general as the first inventory might have the primary template, while a latter the specialization. However, this is probably a quite niche use case.

jakobandersen avatar Mar 13 '21 15:03 jakobandersen

The PR is now updated on top of 5.x where the intersphinx role has been added. I believe it is ready for another round of review.

jakobandersen avatar Jun 24 '22 08:06 jakobandersen

@jakobandersen The implementation looks good, I have a few comments:

  • I think it makes more sense to use a dict as the backing structure in InventoryItemSet
  • I don't see the point of the _v2 suffixes -- they are confusing and when we move to a better format for objects.inv, we would disambiguate by string-based versus structured, I imagine.
  • I think a few of the new functions (e.g. process_disabled_reftypes) and properties could do with a docstring to explain what they're for
  • Some style / typing points (e.g. using snake_case) and improving static typing
  • In general I think we should strongly avoid assert statements, we should either raise an error or remove the assert.

I've implemented most of my suggested changes in https://github.com/AA-Turner/sphinx/commits/intersphinx_delegation

I would like to merge this sooner rather than later if possible -- please let me know if you're happy with my patches on your branch. Adding docstrings/etc can wait until after this is merged if need be.

A

AA-Turner avatar Sep 01 '22 19:09 AA-Turner

I want to release 5.2.0, so I'm moving this to 5.3.0.

@jakobandersen please let me know if you've time to review, otherwise I can move forward to allow progress on the other changes.

A

AA-Turner avatar Sep 23 '22 17:09 AA-Turner

@AA-Turner, please go ahead with a release without this PR. I won't have time to review the next couple of weeks (either).

jakobandersen avatar Sep 23 '22 19:09 jakobandersen

I've resolved the merge conflicts here.

A

AA-Turner avatar Jan 02 '23 06:01 AA-Turner

I've also updated my branch with the suggested changes above (https://github.com/AA-Turner/sphinx/commits/intersphinx_delegation)

A

AA-Turner avatar Jan 02 '23 09:01 AA-Turner

The PR has now been resurrected and should be ready for review and merge. As we are now discussing a new intersphinx format I have reintroduced the v2 suffixes, as there the new format will require a v3 version of those entities.

jakobandersen avatar Mar 28 '24 14:03 jakobandersen

Since I'm busy until Sunday, I will not review that before next week. It's also better for me to have a fresh head and fresh view on that. I believe @chrisjsewell would also want to have a look at it.

picnixz avatar Mar 28 '24 15:03 picnixz