sphinx icon indicating copy to clipboard operation
sphinx copied to clipboard

Add static configuration (``Sphinx.toml``)

Open choldgraf opened this issue 4 years ago • 126 comments

Background

One of the challenges in getting started with Sphinx is the conf.py file, for a few reasons:

  1. It is written in Python, and so it is Python-specific, even if the person writing the documentation is using a different language.
  2. It is a fully-flexible Python script, which can be overwhelming for users not accustomed to it.

Over the years, many other configuration formats have arisen, probably the two most well-known are YAML and TOML. For example. Jupyter Book provides a layer of YAML configuration on top of Sphinx. Users have responded that this is a really friendly pattern for beginners and experts alike. I wonder if Sphinx would be interested in allowing for YAML or TOML configuration as well.

Describe the solution you'd like

In addition to the current config option of conf.py, add another option:

Allow config with YAML. I think it would be useful if Sphinx allowed for:

  • conf.yml. This would be read-in with PyYAML.

This file would be read in and converted to Python variables directly, as if it was written in Python (conf.py). So for example:

# In conf.yml
key: value
mylist:
  - item1
  - item2
mydict:
  dk1: one
  dk2: two

would map onto

# In conf.py
key = "value"
mylist = ["item1", "item2"]
mydict = {"dk1": "one", "dk2": "two"}

Allow conf.py to be provided simultaneously. Some Sphinx builds will still need to run custom Python code (e.g., to set up some extensions etc). In this case, authors may wish to keep their "simple config" in the YAML file, and the complex config in pure Python.

If conf.py is supplied as well as conf.yml, then the environment defined in conf.py will over-rule anything in conf.yml.

So the order of operations would be:

  1. (if it exists) Read in variables from conf.yaml
  2. Update with variables from conf.py if it exists, overwriting variables created in 1
  3. Everything else is the same...

Describe alternatives you've considered

I've tried creating a lightweight extension that allows this but didn't have success because of the way that extensions are activated.

I have also considered other documentation engines like mkdocs, which use YAML, but I'd for this to be in the Sphinx ecosystem!

cc some others who have discussed this in the executablebooks/ repo: @pradyunsg @ericholscher @chrisjsewell

EDIT: I've updated the above description to remove mention of TOML, as I don't want that to derail conversation here!

choldgraf avatar Mar 28 '21 17:03 choldgraf

Just wanted to chime in here and say this would be a great addition. I think it would improve the onboarding experience, and allow simple configurations to be machine-parsable. The dynamic nature of the Python configs definitely leads to a lot of customizations that are harder to support in varied development environments, which is a very common mistake for first-time Read the Docs users.

I'm in favor on adding it, and I also wanted to note that between the Executable Books & RTD teams, we'd probably be willing to implement and document this work, so we're mostly looking for a 👍 or 👎 from the team before starting a PR.

ericholscher avatar Mar 29 '21 17:03 ericholscher

A couple of thoughts from me:

  • This, with the cascading described, would be amazing!
  • Let’s only have a single file format though, and not allow for one-of YAML and TOML.
More unsolicited thoughts

IMO the choice for the file format comes down to “how much do you like nesting”. If you wanna have JSON-like arbitrary nesting, then YAML is likely a better fit.

I’m likely biased, but I do think conf.py’s generally flat structure translates very nicely toward TOML’s design. Most existing conf.py files are probably also almost-valid TOML files already!

Neither choice is wrong, both have gotchas, and I’d like it if we went with TOML here (it also helps the case for if/when I push to get a parser for that into the standard library).

pradyunsg avatar Mar 29 '21 20:03 pradyunsg

Indeed the main thing is to get a 👍 from the sphinx team, and I am definitely +1 😄

In terms of YAML vs TOML; I would note that both jupyter-book (_config.yml and _toc.yml) and RTD (.readthedocs.yml) currently use YAML for their configuration files, and so at least for those use case, I feel TOML would be an additional overhead in understanding for the user

chrisjsewell avatar Mar 29 '21 23:03 chrisjsewell

I don't have a fully formed opinion yet, but some thoughts:

  • conf.py can not go away, both for backwards compatibility, but also as the dynamic nature can be quite useful sometimes, e.g., loading the version from somewhere else, auto-generating content, etc. So as noted in the OP this would be another layer of configuration loaded before conf.py.
  • Therefore, the machine-readability of a configuration would at best be a convention that one could use at a single project, not for arbitrary third-party projects. Theoretically, one could put such conventions on ones own conf.py and read static config data until a custom line comment, but this is indeed icky.
  • I think it is too much to add the special key for Python code. Putting static data into such a config file is fine, and achieve easy machine-readability of that part (by convention). If you need arbitrary Python code, I would say to just stick it in conf.py.
  • The config file should not be misleading with respect to how the configuration really works. I'm not familiar with TOML, but it looks like it has "sections" which I'm not sure how maps to the configuration system. YAML seems to map more naturally.

jakobandersen avatar Mar 30 '21 10:03 jakobandersen

Therefore, the machine-readability of a configuration would at best be a convention that one could use at a single project, not for arbitrary third-party projects.

Maybe a terrible idea, but what if conf.py location was made configurable and nullable in conf.toml/yaml (perhaps still defaulting to conf.py?). Then in the conf.py == null case machine-readability would actually be a thing.

Maybe the null case could even be the default, given that the toml/yaml is a new feature, so it shouldn't break existing projects.

hukkin avatar Mar 30 '21 10:03 hukkin

I guess the classic example of a co-existance of such files is the setuptools setup.py and setup.cfg. I would certainly check their implementation

chrisjsewell avatar Mar 30 '21 10:03 chrisjsewell

What setuptools does is basically pretend there's a minimal setup.py file, if it doesn't exist. Notably, it's possible for tooling to detect whether setup.py exists and if it doesn't, it means that everything is declared statically.

For Sphinx, the minimal conf.py file would be empty, I guess?

pradyunsg avatar Mar 30 '21 11:03 pradyunsg

I'm not familiar with TOML, but it looks like it has "sections" which I'm not sure how maps to the configuration system. YAML seems to map more naturally

TOML allows top level keys though with no section defined. The following is valid toml:

project = "sphinx"
version = "0.0.1"

Sections are only required if there's dictionaries in conf.py in which case they feel very natural to me.

Why I'm not a huge YAML fan is that YAML types are difficult to parse for humans and machines alike. Consider something like

- yes    # bool
- "no"   # string
- false  # bool
- .6432  # float
- "0.1"  # string
- null   # null
- none   # string
- ~      # null
- 0xabba # int

hukkin avatar Mar 30 '21 12:03 hukkin

I'm not familiar with TOML, but it looks like it has "sections" which I'm not sure how maps to the configuration system. YAML seems to map more naturally.

Well, TOML is literally designed to be a configuration file format. https://toml.io uses the tag line: "A config file format for humans". Think of it as unambiguous INI files. The clarity and unambitious nature of the format is why pyproject.toml is TOML based, same for Cargo.toml and more. :)

To address the specific question raised: All key-value pairs in a table [section] end up in a dictionary named section. In other words, it's how you do nesting.


Anyway, the reason I kept my thoughts on file format choices in hidden-unless-you're-curious is because I didn't want push this issue toward that way. I should've just omitted that whole thing.

Let's first wait for opinions on the general idea of static metadata in Sphinx, before discussing the exact file format further. :)

pradyunsg avatar Mar 30 '21 13:03 pradyunsg

Yeh, at the end of the day, JSON/YAML/TOML all basically map to each other 1-to-1, so it won't really affect the underlying code/logic to be written

chrisjsewell avatar Mar 31 '21 02:03 chrisjsewell

+0 for supporting static config file. I don't think python script is not good for the config file. But it's reasonable to support a commonly used file format for sphinx. -1 for supporting the combination of .yaml and .py. It's too complicated and I don't understand the worth of it.

And I don't have opinion for YAML vs TOML. Because I've never written a .toml file.

tk0miya avatar Apr 02 '21 16:04 tk0miya

@tk0miya thanks for your thoughts! Could you clarify why you don't want a combination of YAML and Python? I think the combination of YAML + Python fits the use-case that somebody wants 99% of their configuration in a well-structured config file, but also needs to run some custom Python code if a particular extension needs it or something. I think this is actually a pretty common use-case.

I think we should just scope this conversation to YAML since it is super common for config, and readthedocs uses it, and leave TOML to a later conversation

choldgraf avatar Apr 02 '21 16:04 choldgraf

sphinx. -1 for supporting the combination of .yaml and .py.

It also feels like it would be very difficult to migrate everyone from py to yaml?

chrisjsewell avatar Apr 02 '21 16:04 chrisjsewell

sphinx. -1 for supporting the combination of .yaml and .py.

It also feels like it would be very difficult to migrate everyone from py to yaml?

What would be the benefit of doing that anyway? I can to some degree see the point that each author may want to shift as much as possible to something more easily parsable, but maybe I don't fully get the problem with the current setup:

  1. It's in Python, maybe the user doesn't know Python: well, the same argument can be made about YAML or whatever other format.
  2. It can contain arbitrary code: sure, but as a new user you don't need to put arbitrary code there, and if you read another project, then that arbitrary code would just move somewhere else where you would also need to understand it.

Can you elaborate on the reasoning behind this?

jakobandersen avatar Apr 02 '21 16:04 jakobandersen

Here's a few thoughts:

benefits of YAML

  • Structured and easier to parse (so you can machine-read/write it much easier)
  • Language agnostic (so you don't give one language special status for most use-cases)
  • Extremely common (mkdocs, Hugo, readthedocs + almost any other SSG configure things with YAML. Many people already have a mental model of configuration with YAML). You are correct that perhaps a new user will need to "learn YAML", but because YAML is not a computer language, it is already very commonly used across many other computer languages.

downsides of YAML

  • A decent number of "gotchas" (e.g. true, false, etc)
  • Inflexible (because it is just a data structure, it has no notion of execution etc)

benefits of Python

  • Flexible and extensible
  • Well-known language

downsides of Python

  • Not structured, so hard to parse
  • Less-commonly used as a configuration step in similar tools (though nikola does use conf.py as well)
  • Complex, and can be intimidating to new users who must now learn a computer language
  • Implies that Sphinx is "just for Python documentation", which I don't think is true

So to me this sounds like a reasonable base for : Support YAML for simple configuration use-cases, which are probably most use-cases. For anything advanced, let people provide a conf.py for more complex configuration. YAML maps pretty cleanly onto variable creation in Python and there are a ton of YAML readers out there, so this would be both low-maintenance, and a good entry-point into the Sphinx ecosystem for people who are used to configuring things with YAML. It would also make it easier for services to build on top of Sphinx - for example, Jupyter Book or ReadTheDocs.

As an aside, one of the most common things people like about Jupyter Book (which is built on Sphinx), is that it supports YAML configuration. One reason I opened this issue is because so many people have told me they prefer YAML, that I think it is worth considering for core Sphinx, as I think it would be a benefit to many.

choldgraf avatar Apr 02 '21 17:04 choldgraf

@choldgraf, right, basically I agree with all those points. Where the disagreement/confusion comes from is how this will work, and the comment from @chrisjsewell:

It also feels like it would be very difficult to migrate everyone from py to yaml?

Maybe I misunderstand, but that seems to imply that only one of conf.py and conf.yaml should exist? One of the things I find really appealing with Sphinx, and Python in general, is the hackability. Sphinx is already quite extensible via documented means, but otherwise Python allows for easily run-time haxing of whatever is needed until a proper solution can be found. Therefore I suggest conf.py and conf.yaml must co-exist, in the sense that variables in conf.py overrides those in conf.yaml. This makes the implementation backwards compatible and still allows arbitrary code for initialisation.

jakobandersen avatar Apr 02 '21 17:04 jakobandersen

Maybe I misunderstand, but that seems to imply that only one of conf.py and conf.yaml should exist?

Oh no, I'm arguing for exactly the opposite lol

chrisjsewell avatar Apr 02 '21 17:04 chrisjsewell

@jakobandersen ah in that case I totally agree with you, I think @chrisjsewell was suggesting they need to co-exist as well. I'll try to clarify this in the title + top-comment as well

choldgraf avatar Apr 02 '21 17:04 choldgraf

Ah, all good then :-) As an add-on suggestion: sphinx-quickstart should be updated to generate both files, static data and associated comments in the YAML file, and then additional comments explaining the relationship between the files and the rationale for having them (i.e., the essence of this thread). It could even be updated such that in the final script output where it explains how to proceed, then also write about how to configure with the YAML and Python files.

jakobandersen avatar Apr 02 '21 17:04 jakobandersen

  • +1 for supporting static config file. I had thought about introducing conf.ini too, before YAML became as popular as it is now. This is because I felt that writing configuration in Python script is a subtle stumbling block for beginners.
  • -1 for supporting the combination of .yaml and .py. About the hackability of conf.py, I think it would be a good idea to be able to write a new extension mechanism for configuration, because I feel that allowing conf.yaml to override values in conf.py would introduce new stumbling blocks.

shimizukawa avatar Apr 04 '21 08:04 shimizukawa

allowing conf.yaml to override values in conf.py would introduce new stumbling blocks.

Hmm... I would've imagined setting a value in conf.yml and conf.py would result in an error OR cause the Python value to be used.

pradyunsg avatar Apr 04 '21 08:04 pradyunsg

Awesome! So, everyone is on board for (or ambivalent to) allowing static metadata! 🎉


I think there's a few things to decide on AFAICT:

  • file semantics
  • file name
  • file format

Remembering the law of triviality, I'm gonna focus on semantics first. :)

Semantics

At least 2 folks have stated a -1 for allowing both the static metadata file, and Python file to exist at the same time, because it would get confusing when folks define keys in both. I agree! Specifying values in both is weird and confusing.

I disagree that we shouldn't allow the files to complement each other when they both exist, without overlaps though. Allowing both to co-exist, and erroring out if the same value is defined in both (which isn't that much code complexity), allows for a significantly better experience with the static metadata:

I write a nice Sphinx site, with only static metadata. After some time, I realize I do need some amount of dynamic behaviour (idk, need to add to sys.path for autodoc to work). If we don't allow both files to co-exist, this means that now I'll have to translate the YAML configuration into Python values, and start all over again. Compare that to just being able to add that sys.path.append in a newly created conf.py and moving on. After the first experience, I don't think I'd bother with the YAML files again. The second one is much nicer!

File name

In conf.yml

Let's use sphinx instead of conf in the filename?

That way, it's much clearer what tool is being used. Searching for the filename on search engines will actually yield useful results; which likely won't happen for conf.yml.

File format

Full disclosure: I am the primary maintainer of TOML now. And, unsurpisingly, I'd like to advocate for adopting TOML over YAML here.

Excuse me for being lazy, and quoting some pieces of writing:

In the toml-lang/toml README -- this is the only quote where I've contributed wording.

TOML shares traits with other file formats used for application configuration and data serialization, such as YAML and JSON. TOML and JSON both are simple and use ubiquitous data types, making them easy to code for or parse with machines. TOML and YAML both emphasize human readability features, like comments that make it easier to understand the purpose of a given line. TOML differs in combining these, allowing comments (unlike JSON) but preserving simplicity (unlike YAML).

Because TOML is explicitly intended as a configuration file format, parsing it is easy, but it is not intended for serializing arbitrary data structures. TOML always has a hash table at the top level of the file, which can easily have data nested inside its keys, but it doesn't permit top-level arrays or floats, so it cannot directly serialize some data. There is also no standard identifying the start or end of a TOML file, which can complicate sending it through a stream. These details must be negotiated on the application layer.

INI files are frequently compared to TOML for their similarities in syntax and use as configuration files. However, there is no standardized format for INI and they do not gracefully handle more than one or two levels of nesting.

Comparison of configuration file languages, done during the PEP 518 discussion

Personally, I would sum up the above as:

|                             | YAML | JSON | CP  | TOML |
|-----------------------------+------+------+-----+------|
| Well-defined                | yes  | yes  |     | yes  |
| Real data types             | yes  | yes  |     | yes  |
| Sensible commenting support | yes  |      |     | yes  |
| Consistent unicode support  | yes  | yes  |     | yes  |
| Makes humans happy          |      |      | yes | yes  |

[snip] Given all of the above, I tend to think the trade-offs fall in favor of TOML.

PEP 518's discussion of "why not YAML"

One is that the specification is large: 86 pages if printed on letter-sized paper. That leaves the possibility that someone may use a feature of YAML that works with one parser but not another. It has been suggested to standardize on a subset, but that basically means creating a new standard specific to this file which is not tractable long-term.

Two is that YAML itself is not safe by default. The specification allows for the arbitrary execution of code which is best avoided when dealing with configuration data. It is of course possible to avoid this behavior -- for example, PyYAML provides a safe_load operation -- but if any tool carelessly uses load instead then they open themselves up to arbitrary code execution. While this PEP is focused on the building of projects which inherently involves code execution, other configuration data such as project name and version number may end up in the same file someday where arbitrary code execution is not desired.

Example demonstrating how YAML can be ambigous in weird ways, from earlier in this thread

Why I'm not a huge YAML fan is that YAML types are difficult to parse for humans and machines alike. Consider something like

- yes    # bool
- "no"   # string
- false  # bool
- .6432  # float
- "0.1"  # string
- null   # null
- none   # string
- ~      # null
- 0xabba # int

One fun example of this is the Norway-YAML law.

Finally, thanks to pyproject.toml, a lot of Python tooling is going to be configured through TOML going forward. It'd be nice for Sphinx to hop on board as well! :)

pradyunsg avatar Apr 04 '21 09:04 pradyunsg

At least 2 folks have stated a -1 for allowing both the static metadata file, and Python file to exist at the same time

Maybe I misunderstood, but I got the impression that @tk0miya wanted to completely remove the python file, rather than just restrict its use?

chrisjsewell avatar Apr 04 '21 10:04 chrisjsewell

realize I do need some amount of dynamic behaviour

One dynamic thing I actually do a lot in projects is add a builder-inited event, to run sphinx-apidoc and automate the build of the api documentation pages (which I gitignore from the repo). But maybe I am missing a better way to do this?

chrisjsewell avatar Apr 04 '21 10:04 chrisjsewell

But maybe I am missing a better way to do this?

Well, if you're missing something, then it's you and I both. :)

One of the nice things about conf.py is that it also basically serves as an extension, once you add the setup function.

pradyunsg avatar Apr 04 '21 14:04 pradyunsg

If Sphinx does allow for configuration from static metadata, I would suggest using Python literals as the file format; see an example (sphinx_static_config.zip) of conf.py below. Since this format is a subset of the Python language, everyone familiar with conf.py will already know to encode configuration data, rather than learning to express values in YAML/TOML/JSON/etc.

import ast


def setup(app):
    with open("sphinx-conf.pylit", encoding="utf-8") as f:
        cfg = ast.literal_eval("{\n" + f.read() + "\n}")
    for key, value in cfg.items():
        app.config[key] = value

bjones1 avatar Apr 05 '21 12:04 bjones1

I would suggest using Python literals as the file format

There's probably two groups of people:

a) non-Python people using MkDocs (or similar) instead of Sphinx, because it uses a widely used static configuration format b) python people unfamiliar with YAML/TOML/JSON

I'd imagine a) is the group we're targeting here and also probably the larger group. Also, group b) already has the conf.py... So I'd stick to either TOML or YAML.

Also YAML, TOML, JSON etc. already have existing tools (parsers, formatters etc) in a variety of programming languages, something that Python literals don't.

hukkin avatar Apr 05 '21 14:04 hukkin

Notice that "The recommended extension for files containing YAML documents is .yaml ( http://yaml.org/faq.html ) and this has been the case since at least Sep 2006 ( https://web.archive.org/web/20060924190202/http://yaml.org/faq.html )." (instead of .yml) (copied from https://github.com/readthedocs/readthedocs.org/issues/7460#issue-694055600)

astrojuanlu avatar Apr 05 '21 15:04 astrojuanlu

(In anycase, I'd support TOML over YAML as well - but I have no horse on this race :) )

astrojuanlu avatar Apr 05 '21 15:04 astrojuanlu

I think the combination of YAML + Python fits the use-case that somebody wants 99% of their configuration in a well-structured config file, but also needs to run some custom Python code if a particular extension needs it or something.

Please let me know an example. I think "custom Python code" is not a configuration. So it would be better to use "extension" instead. I can understand you'd like to use the combination of config.yaml and ext.py. But I can't imagine the case both config.yaml and config.py are needed.

Compare that to just being able to add that sys.path.append in a newly created conf.py and moving on. After the first experience, I don't think I'd bother with the YAML files again. The second one is much nicer!

I think it's better to add a new configuration to append sys.path to the YAML file.

File name Let's use sphinx instead of conf in the filename?

+1

File format

IMO, YAML is widely used than TOML. The one of the goals of this issue is supporting commonly used file format as configuration of Sphinx. So I'd like to vote to YAML. But I also think pyproject.toml is the future of python. So it would be fine if we support both sphinx.yaml and pyproject.toml.

Maybe I misunderstood, but I got the impression that @tk0miya wanted to completely remove the python file, rather than just restrict its use?

No. I don't think dropping conf.py support. We have tons of conf.py in the world. It's terrible. What I objected is using YAML and conf.py at the same time.

tk0miya avatar Apr 07 '21 14:04 tk0miya