sphinx icon indicating copy to clipboard operation
sphinx copied to clipboard

[8.x] Remove Intersphinx Legacy Format and Fix intersphinx cache loading in incremental builds

Open picnixz opened this issue 1 year ago • 7 comments

Fix #11466.

  • Add test for intersphinx incremental builds.

So, the issue was that were ordering lists of (str, dict). However, this only occurred if we changed the URL of an intersphinx inventory and if there exists a cached inventory for the same project, e.g., building a project with intersphinx_mapping = {"foo": (url1, None)} and then building it again with intersphinx_mapping = {"foo": (url2, None)} was the issue. This bug only appeared in incremental build and by using -E once, it would disappear so it could have been hard to track for users in general.

picnixz avatar Oct 04 '23 14:10 picnixz

:white_check_mark: checked that the unit test does replicate the dict comparison problem reported (first, second) in the bugreport, and that the code change resolves the problem.

So I think this is good - I'd like to understand a bit more of the functionality of this component while we have the opportunity if possible (hence the other comments/questions).

jayaddison avatar Oct 05 '23 11:10 jayaddison

So I think this is good - I'd like to understand a bit more of the functionality of this component while we have the opportunity if possible (hence the other comments/questions).

Fine by me. I acknowledge that there are many obscure components and I only know about them by actually fighting against them. But feel free to ask questions.

picnixz avatar Oct 05 '23 11:10 picnixz

I still have one unanswered question about whether to remove or not old cached entries that will not be used at runtime. As I said, building with URI_A, then URI_B, then URI_A again would actually lead to inconsistencies where we would never use the data from the latest URI_A and instead always use the data from URI_B.

I need to think about it this week-end, possibly layout out the corner cases. For now, I'll convert it into a draft PR and I'll make it a real PR when ready.

picnixz avatar Oct 05 '23 12:10 picnixz

I've reviewed my comments and I don't understand them anymore. I think I either forgot some things or did not update them correctly. So I'll review from the beginning

picnixz avatar Feb 13 '24 16:02 picnixz

Ok, I've read what I did and there are some unclear parts. I'll do something tomorrow because it's still something I need to check with the rest of the codebase. I'm no more sure about the invariants and uniqueness.

picnixz avatar Feb 13 '24 18:02 picnixz

By working on this PR, I observe that having the legacy format for intersphinx mapping is making everything more complicated. As such, I'll simply assume that this PR is for 8.x and remove the old intersphinx format at the same time (making it work for both the old and the new formats is extremely hard and I prefer having breaking changes since it eases my life).

picnixz avatar Feb 14 '24 12:02 picnixz

I think I'm done with this one ! my previous analysis was too complex and a bit faulty so here is the updated solution:

  • Whenever we change a configuration value, the old entries are still kept if they are not overwritten. For instance, if I have
intersphinx_mapping = {'foo': ('foo.com', None), 'bar': ('bar.com', None)}

my cache will have the (project, uri) entries ('foo', 'foo.com') and ('bar', 'bar.com'). If I change my config and rebuild but uses

intersphinx_mapping = {'foo': ('bar.com', None)}

my cache will have the (project, uri) entries ('foo', 'foo.com') and ('foo', 'bar.com'). It does not make sense to keep the old link for the project, so we can remove it from the cache, even before querying the inventories (so no more race condition or whatever).

In addition, it does not make sense to have two projects with the same URI:

intersphinx_mapping = {'foo': ('foo.com', None), 'bar': ('foo.com', None)}

Currently we don't even ensure verify so when we do cache[uri] = ... we have no guarantee that the operation is correct depending on which thread is doing the operation. So this is a new precondition on the intersphinx_mapping that is now verified during the normalization (a warning will be emitted as well).

The legacy format was scheduled for deprecation in 6.2 for 8.x and since I removed what I could find related to that, this PR should only be either merged as the first one for 8.x (since it's deprecating an important format) or a separate PR focused on the validating and normalizing entries assuming the new format only.

picnixz avatar Feb 14 '24 17:02 picnixz

This is next on my review list (I'd like to try minifying/reducing the changes to explore the differences).

jayaddison avatar Mar 04 '24 23:03 jayaddison

I can refactor this PR and first remove the intersphinx legacy format in a separate one?

picnixz avatar Mar 05 '24 17:03 picnixz

I can refactor this PR and first remove the intersphinx legacy format in a separate one?

That would help me, although it's more work for you - I'm OK with either approach!

jayaddison avatar Mar 05 '24 18:03 jayaddison

I think that 'either' was unclear. I mean that it's either OK to separate the legacy-format-removal, or OK not to and I'll review this code as-is. Just let me know what you prefer and have time for.

jayaddison avatar Mar 05 '24 18:03 jayaddison

Simce this PR still requires to get to 8.x either way, I think it's preferable to split it into two. That way we can always deprecate the things but still reconsider how to fix the other issue.

picnixz avatar Mar 05 '24 20:03 picnixz

Ok, closing this one for now. I'll open a separate one based on #12083.

picnixz avatar Mar 14 '24 10:03 picnixz