sphinx
sphinx copied to clipboard
Incomplete documentation for env_version
Describe the bug
https://www.sphinx-doc.org/en/master/extdev/index.html#extension-metadata documents env_version.
-
[ ] It says:
[...] env data structure if the extension stores any data to environment
... but it gives no indication how/why to do that. It could e.g. link to https://www.sphinx-doc.org/en/master/development/tutorials/todo.html
-
[ ] It says:
If not given, Sphinx considers the extension does not stores any data to environment.
... but it doesn't say if that includes a
Domain(https://www.sphinx-doc.org/en/master/extdev/domainapi.html) defined by the extension. I assume that the domain is somehow stored in the environment, but I guess it doesn't need to be reflected inenv_versionbecause it tracks its own structure with thedata_versionattribute? -
[x] https://www.sphinx-doc.org/en/master/development/tutorials/todo.html shows an extension that stores data in the environment, but it does not return
env_versionin itssetup()function!- potential fix: #12349
-
[ ] https://www.sphinx-doc.org/en/master/development/tutorials/todo.html could mention that alternatively, a
Domaincould be used to store data in the environment. It could link to https://www.sphinx-doc.org/en/master/development/tutorials/recipe.html, which provides an example for that. -
[x] https://www.sphinx-doc.org/en/master/development/tutorials/recipe.html does not mention
data_version, which is of course not strictly necessary, but I think it might be a good idea to recommend specifying it, to make it easier to not forget bumping it once there are changes to the data at a later time.- potential fix: #12350
-
[x] This has nothing to do with
env_version, but looking at the tutorials, I have seen that https://www.sphinx-doc.org/en/master/development/tutorials/autodoc_ext.html does not setparallel_read_safebut I guess there is no reason for this extension to disable parallelization, right?- this was fixed in #12288
-
[ ] The docstring of
Domaindoesn't give any information whether it is recommended to use it to store arbitrary data (see https://github.com/sphinx-doc/sphinx/issues/12237#issuecomment-2047118111)
How to Reproduce
Look at https://www.sphinx-doc.org/en/master/extdev/index.html#extension-metadata
@chrisjsewell You introduced the typed interface for that one so perhaps you could take that one?
you introduced the typed interface for that one so perhaps you could take that one?
Well, that's one of the reasons I introduced it; because for a long time I had no idea env_version was even an option 😅
But indeed thank you for detailing this @mgeier, it would be good to make the documentation around it clearer
alternatively, a Domain could be used to store data in the environment.
I would note here, that I feel the purpose of a domain is more around handling referencing, and then the storage of data here is more of a "side-effect" of that; to store pointers to targets, etc I don't believe you should use it to store arbitrary data.
I don't believe you should use it to store arbitrary data.
This kinda says the opposite: https://github.com/sphinx-doc/sphinx/issues/9003#issuecomment-799084236
Would you suggest adding arbitrary attributes to app.env instead?
Or do you have something even better?
And it's not only about storing data, it's also about making it work with parallel builds, which is actually hard, see e.g. https://github.com/sphinx-doc/sphinx/pull/12251.
And to be fancy, it's about removing the appropriate data if a source file is removed (which the "todo" extension does but the "recipe" extension doesn't, AFAICT).
For a concrete example, see https://github.com/mgeier/sphinx-last-updated-by-git/pull/66. I'm not sure if that's better or worse, BTW!
This kinda says the opposite
well that is not consistent with what the actual source says: https://github.com/sphinx-doc/sphinx/blob/6fd8b3004315f56b4464d9426829f7643e7b7bc1/sphinx/domains/init.py#L159-L180
And it's not only about storing data, it's also about making it work with parallel builds, which is actually hard
We deal with this in sphinx-needs with parallel builds for large documentation sets, and it works quite well: https://github.com/useblocks/sphinx-needs/blob/0f71ecde47a3f7c3245c5bcf545ef5281376768c/sphinx_needs/needs.py#L266-L270
But I agree that it would be nice to have a "consistent" workflow for storing data (outside of references)
that is not consistent with what the actual source says
Well, the docstring you are quoting is nearly 15 years old, so maybe the usage recommendations of this have changed slightly in the meantime?
Maybe the docstring should be updated? I have added this to the task list above.
Is there anything except that docstring that makes you think that it is not appropriate to store arbitrary data in a Domain (as compared to adding arbitrary attributes to app.env)?
We deal with this in sphinx-needs with parallel builds for large documentation sets, and it works quite well: https://github.com/useblocks/sphinx-needs/blob/0f71ecde47a3f7c3245c5bcf545ef5281376768c/sphinx_needs/needs.py#L266-L270
Thanks for this example, this is really interesting, since it seems to be a perfect fit for a Domain!
The SphinxNeedsData class looks like it could be perfectly derived from Domain:
https://github.com/useblocks/sphinx-needs/blob/0f71ecde47a3f7c3245c5bcf545ef5281376768c/sphinx_needs/data.py#L399-L597
Now the merging and invalidation of data is scattered over different places, a Domain would allow everything to be implemented in one place.
As an added benefit, this domain could then be used to namespace the (many) roles/directives to get something like needs:table instead of needtable, which would arguably a bit cleaner.
I'm of course not suggesting to do that now, because it would be too disruptive, but for a new project I would definitely consider it.
Ironically, the extension is not returning env_version, which makes Sphinx believe that nothing is stored in app.env, AFAIU: https://github.com/useblocks/sphinx-needs/blob/0f71ecde47a3f7c3245c5bcf545ef5281376768c/sphinx_needs/needs.py#L302-L306
Adding env_version has already been suggested in https://github.com/useblocks/sphinx-needs/issues/852#issuecomment-1934707627
Oh I forgot there is something more in-line with what you want: https://github.com/sphinx-doc/sphinx/blob/5562e2b032820043ce75ee8645fe0d8e2bbf4d11/sphinx/environment/collectors/init.py#L14
and can be added with https://github.com/sphinx-doc/sphinx/blob/5562e2b032820043ce75ee8645fe0d8e2bbf4d11/sphinx/application.py#L1221
you see it just links methods to event hooks though, ao nothing you couldn't implement on your own
so maybe the usage recommendations of this have changed slightly in the meantime? Is there anything except that docstring that makes you think that it is not appropriate to store arbitrary data in a
Domain
The whole of the Domain API is designed around storing and resolving references, this is the only thing it is used for in core sphinx, and there will likely be even more logic being added in this direction in the future: #8929
In addition, all roles are expected to be references to domain objects, and all directives are expected to create targets.
As I mentioned already, storing data is just a "side-effect" of reference resolution, not a primary goal of the Domain.
If all you want to do is store data, then you are bringing in a whole lot of things you probably don't need;
for example you literally have to implement a concrete resolve_any_xref method just to get it to work.
Adding env_version has already been suggested in https://github.com/useblocks/sphinx-needs/issues/852#issuecomment-1934707627
yep by me 😄
Just my 2 cents:
- I don't store information in domains, and rather store everything in the Environment.
Would you suggest adding arbitrary attributes to app.env instead?
In my own extension, I have a single dictionary which is storing every information I need. It works a bit like the registry on MS-DOS systems where you have namespaces + key/values and any other "component" can put whatever they want in that registry using their own registry key (a bit like the Stash in pytest actually).
I create that dictionary as a single attribute in app.env upon 'builder-inited' and merge information in 'env-merge-info'. I can easily access whatever information is being stored using something like app.env.registry[namespace][key].
does not set parallel_read_safe but I guess there is no reason for this extension to disable parallelization, right
Probably because we didn't want the tutorial to add some unknown logic.
... but it gives no indication how/why to do that.
There is actually a docstring about what the environment version is meant for:
# This is increased every time an environment attribute is added
# or changed to properly invalidate pickle files.
ENV_VERSION = 61
See daade2715a81e21ca58d3356c5e17c66accc8f65, which redrafts some of the wording in https://www.sphinx-doc.org/en/master/extdev/#extension-metadata.
I think that this should close this issue by covering @mgeier's remaining points, save regarding where best for an extension to store state/data, which I think should be split into its own issue (especially as it isn't solely a documentation point), and may link to #13072 if we can come up with a better way of storing data than simply a free-for-all creating new attributes on env.
Matthias -- do you think there is anything further outstanding in terms of the documentation of env_version?
A
The docs now state "... if the extension uses the env object to store data", which is better than before, but it still doesn't say how exactly (by creating arbitrary attributes of arbitrary types?), nor what's best practice (use a unique prefix to avoid collisions with other extensions?), nor what has to be done to keep this all working in parallel builds (which is quite complicated and part of the reason why I brought up "Domains").
The docs of 'env_version' are probably not the best place to document all this, but it should be documented somewhere (maybe it already is?) and then linked there.
As for the remaining open points related to "Domain", we could of course open a new issue, but since there is already quite some discussion here, it might be better to keep this open?
I am still planning to investigate this further and to respond to the comments here, but recently I didn't have a lot of free time to do this.