confluencebuilder Suggestion: improve support for sphinx.ext.intersphinx beyond the "std" domain

In https://github.com/sphinx-contrib/confluencebuilder/issues/338 you implemented basic support for sphinx.ext.intersphinx, stating that it only creates std:doc and std:ref entries for documents (pages) and title entries for the initial release. I suggest to improve this and generate a more complete inventory with the other domains, as this restriction seems unnecessary or at least too generic. I have been playing around with the current implementation - based on the development version 1.9.0.dev0 - and found that with a simple code change I was at least able to get inventory entries for the "py" domain, which equal those generated by a Sphinx HTML build of the same documentation set. This made the inventory much more complete and another documentation set was able to resolve much more intersphinx references than before.

The patch is in sphinxcontrib.confluencebuilder.intersphinx.py, inside of function build_intersphinx(builder):

        for domainname, domain in sorted(builder.env.domains.items()):
#            if domainname == 'std':
            if domainname in ('std', 'py'):
                for name, dispname, typ, docname, raw_anchor, prio in sorted(
                        domain.get_objects()):

I have not tried this for any other domain than "py" but it will probably work in the same way. Maybe this restriction could be lifted completely.

To be able to use an intersphinx mapping with more than one inventories for documentation sets hosted on the same Confluence instance - hence the target URL for each inventory is the same - I had to implement a better caching logic in sphinx.ext.intersphinx. I have adressed this topic to the developer of that extension in https://github.com/sphinx-doc/sphinx/issues/10494.

Finally, I suggest to store the generated object.inv together with the generated documentation, as an attachment of the root doc page. It should then be possible to address them in an intersphinx mapping with an URL like https://your_confluence_server/download/attachments/root_page_id/objects.inv?api=v2 . However, I have not tried this yet and am not aware if it would work for Server and Cloud.

Jun 10 '22 14:06 RobertSeeger

@RobertSeeger,

With respect to your first point, it would be proper that the final implementation would not include the explicit domainname type check that was added in the initial release. The only reason why such a restriction was made was due to limited test sets and testing (wanted to avoid causing a user's documentation build to break or generating a bad Intersphinx database on a domain which may needed more care). If removing the restriction seems to be working for your documentation set which utilizes multiple domains, maybe the restriction should just be dropped. It would be nice to have some same sets and unit test to validate that we are generating "proper" Intersphinx databases.

For storing a generated object.inv on a Confluence instance, this should be possible. From a quick look, we may need to move processing Intersphinx generation between document publishing and asset publishing (images/downloads), and register the inventory file with a method similar to ConfluenceAsset.process_file_node that targets the root document. We would most likely want to ensure the following as well:

A first thought was that the automatic uploading of this file should not be added by default, since there is an assumption that most use cases may not be utilizing this feature that much, and we would want to avoid uploading attachments when possible. However, maybe it would be good practice just to do it, so the file would be there for anyone use use (if the publisher did not care; and it would be available for other users and their documents to reference). Python's own object.inv file is only a couple of kb, which is no where near as big as a file size hit over images commonly used in documents. There should be an option to allow a user to enable or disable the uploading of this file (outside of enabling Intersphinx).
Possibly prevent the filename object.inv from being used for :download: role, so it does not clobber the database.

Jun 12 '22 12:06 jdknight

A recent change has been added into the main branch (#690) to extend intersphinx's capabilities.

The restriction to only the std domain has been dropped. Sanity checks when compiling against Sphinx documentation (which utilizes a couple of domains) did not see any issues (in terms of generation). While not all populated inventory entries may be correct at this time (since we may not be generating some anchor targets to link against), it is assumed that references should at least link to specific pages. Removing this restriction should be easier for users to find out what specific intersphinx references do not work, which in turn, can be reported in this issue tracking to be looked at and improved in future versions.

We also now publish the object.inv file to the configured root document's page (as an attachment). This should provide an easy way for users to acquire an inventory file without a publishing needing to manually copy/move the file to some other hosting solution. Note that with some testing done, registering the Confluence attachment directly into a intersphinx_mapping configuration may not yield expected results for some users. Without getting into authentication issues, one thing noticed is that if the intersphinx extension downloads these attachments, Confluence Cloud will report an alternative location for these files. For example:

https://<SPACE-KEY>.atlassian.net/wiki/download/attachments/<FID>/objects.inv?api=v2

Can result in a following HTTP headers being reported:

HTTP/2
302
date: Sun, 17 Jul 2022 20:11:26 GMT
content-type: text/html;charset=UTF-8
...
location: https://api.media.atlassian.com/file/<UID>&name=object.inv&dl=true
...

This in turn causes links to be generated against a https://api.media.atlassian.com/... base URL, which is not desired. Unless there is an option in intersphinx to override the base URL for a given mapping, a change may need to be added to intersphinx to help deal with this scenario.

In addition, authentication with intersphinx does not appear to be too flexible. It does appear that if one prefixes basic authentication details to the URL entry, it can download protected inventories. However, using basic authentication may not be usable for some Confluence instances. It would be nice if there was a way to customize/provide a unique Request session per inventory mapping. And if something were to be possible, it would be nice if the Confluence builder extension could hint to the intersphinx extension to use already configured authentication details (if applicable to the mapping target). But again, the ability to tweak intersphinx's Request sessions and API for other extensions to attach to would require work directly on the intersphinx extension (assuming this is a good idea).

When can leave this issue open for now, indicating an enhancement has been made. If you (or others) experience issues with specific intersphinx references, we can add them here to be addressed. Once a next release is made (v1.9), I will mark this issue as closed. And if/when more specific intersphinx issues are found, users can create new issues (ideally, with examples) with more information on the specific failure.

Jul 23 '22 19:07 jdknight

v1.9 is now available on PyPI, which should have an unrestricted domain processing and automatic publishing to the root document -- marking as closed.

Users experiencing specific issues with Intersphinx not providing expected links or anchors for certain domains are welcome to create specific issues outlining these cases. We can try to improve this capability in this extension over time.

Aug 21 '22 18:08 jdknight

confluencebuilder confluencebuilder copied to clipboard

Suggestion: improve support for sphinx.ext.intersphinx beyond the "std" domain

confluencebuilder
confluencebuilder copied to clipboard