bandersnatch icon indicating copy to clipboard operation
bandersnatch copied to clipboard

Many options do not have a configured fallback value even if the documentation says so

Open ichard26 opened this issue 4 years ago • 10 comments

What's the issue:

Some of the configurable knobs don't actually have a fallback value defined even though the documentation explicitly states these values do have a default. Case in point, let's take a gander at the [mirror].timeout value -- it's documented as having a default of 10 seconds:

The default value for this setting is 10 seconds. https://bandersnatch.readthedocs.io/en/latest/mirror_configuration.html#timeout

buttt looking at the source code it actually doesn't have one: https://github.com/pypa/bandersnatch/blob/3b78e040a330e3308389e3d32d8c13e8b8c41af5/src/bandersnatch/mirror.py#L1012-L1014

Potential solutions:

  • Fix the lack of fallback=$val for all of the options that are documented having a default
  • Fix the documentation to avoid stating these options have a default

I personally prefer the first one since having defaults makes sense so fixing the code to actually implement them is a good idea ... although this is a good opportunity to do a configuration adiut and see if any options shouldn't exist / or have a default / should have their default changed.

Additional context:

I was notified of this issue from a quick exchange in the Python Discord server: https://discord.com/channels/267624335836053506/463035462760792066/877942046952947722

ichard26 avatar Aug 20 '21 02:08 ichard26

The quoting of default refers to the default.conf file have in the repo for people to use to get started.

cooperlees avatar Sep 09 '21 04:09 cooperlees

I just hit this same issue — I wrote a config file from scratch based on the documentation, and was surprised to find that "the default value for this setting is [some value]" didn't mean that the setting had a default value. I never even saw default.conf until I went to file an issue about the defaults not working as documented and found this issue.

jbg avatar Jan 18 '23 12:01 jbg

PRs to link docs to default.conf and make configparser set default both welcome.

cooperlees avatar Jan 18 '23 15:01 cooperlees

I've started looking at the documentation and code for mirror configuration. These are the options I've found so far:

Option Name Type Doc'd? Default? (Doc/Actual) Source File
json bool yes - / False configuration.py
root_uri str yes "https://files.pythonhosted.org/"[^1] "
diff-file str yes - / "" "
diff-append-epoch bool yes - / False "
storage-backend str no[^2] "filesystem"/"filesystem" "
digest_name str no - / "sha256" "
cleanup bool no - / False "
release-files bool yes True / True "
compare-method str yes "hash"/ "hash" "
download-mirror str yes - / "" "
download-mirror-no-fallback bool no - / False "
simple-format str yes "ALL" / "ALL" "
master str yes "https://pypi/org" / !!! mirror.py
timeout int yes 10 / !!! "
global-timeout int yes 18000 / !!! "
proxy str yes - / "" "
stop-on-error bool yes - / False "
workers int yes 3 / !!! "
hash-index bool yes False / !!! "
keep_index_versions bool no - / 0 "

There are a handful that aren't documented (that is, at least not on the mirror configuration page), as well as the ones that are set in default.conf but the don't have a default/fallback when a user's config file is loaded. I intend to update mirror_configuration.md in the docs as well as look into adding defaults in code for the options that don't have one.

Presently there is a function in configuration.py that validates most of the options while others are only read in mirror.py. I don't know if it would be a problem to consolidate all configuration loading into one place.

[^1]: root_uri's default value interacts with release-files. The description of the default in the docs matches the code, which is in pseudocode something like: if not release-files and not root-uri then "https://files.pythonhosted.org/" else "" [^2]: Option and default are documented w/ storage backends, but not in the list of mirror configuration options. Could be listed with a cross reference?

flyinghyrax avatar Feb 07 '24 01:02 flyinghyrax

Thanks for digging in.

Presently there is a function in configuration.py that validates most of the options while others are only read in mirror.py. I don't know if it would be a problem to consolidate all configuration loading into one place.

Welcome to move all to configuration if you feel there is an advantage

root_uri's default value interacts with release-files. The description of the default in the docs matches the code, which is in pseudocode something like:

Is there a missing question here? Pseudo code seems right.

Option and default are documented w/ storage backends, but not in the list of mirror configuration options. Could be listed with a cross reference?

Sure - cross reference. But lets avoid as much duplication as possible

cooperlees avatar Feb 07 '24 03:02 cooperlees

Hi @cooperlees , good to hear from you!

Presently there is a function in configuration.py that validates most of the options while others are only read in mirror.py. I don't know if it would be a problem to consolidate all configuration loading into one place.

Welcome to move all to configuration if you feel there is an advantage

I'll look in to this. I have one idea I'd like to hear your thoughts on. I was wondering if it would make sense to always load the module's default.conf, whether or not there is a user config file, and have values from the user config overwrite/take precedence. That would make default.conf a single source for what the default values are, but it could also mean making default.conf more extensive and less suited for someone first getting started.

Otherwise I'm happy to just consolidate things into configuration.py a bit, but seeing where default.conf gets loaded when there's no file specified got me thinking about trying to expand its use.

root_uri's default value interacts with release-files. The description of the default in the docs matches the code, which is in pseudocode something like:

Is there a missing question here? Pseudo code seems right.

No missing question! I originally wrote the table as a list for my own reference, and when I posted it my footnotes came along too. Glad to hear my interpretation seems accurate. 🙂

Option and default are documented w/ storage backends, but not in the list of mirror configuration options. Could be listed with a cross reference?

Sure - cross reference. But lets avoid as much duplication as possible

Sounds good. 👍

Can't promise a timeline for a PR, but I'm happy I've found something I feel like I could contribute to, so I feel optimistic about finishing this.

flyinghyrax avatar Feb 08 '24 00:02 flyinghyrax

I was wondering if it would make sense to always load the module's default.conf, whether or not there is a user config file, and have values from the user config overwrite/take precedence.

I like this idea a lot. Please lets unit test all scenarios tho please. Should be able to make configuration.py very clean as it seems you can just call read twice with configparser.

  • Please make sure we include default.conf in our packages too (I think we do)
def merge_ini_files(file1: str, file2: str):
    config = configparser.ConfigParser()

    # Read the first INI file
    config.read(file1)

    # Read the second INI file
    config.read(file2)
    ...

cooperlees avatar Feb 08 '24 04:02 cooperlees

I've opened a draft pull request (#1669). I've made it a draft because:

  • I do not know if it is preferable to review documentation updates separately from code changes, or to put them together in one PR.
  • I maybe got a bit carried away rewriting/reorganizing and it may be too big of a change relative to the initial goal here.

Working on the mirror documentation definitely increased my understanding of Bandersnatch's behavior with different options, but of course let me know if I've introduced any inaccuracies.

I was wondering if it would make sense to always load the module's default.conf, whether or not there is a user config file, and have values from the user config overwrite/take precedence.

I like this idea a lot. Please lets unit test all scenarios tho please. Should be able to make configuration.py very clean as it seems you can just call read twice with configparser.

The only concern I have thus far is root_uri - I think it needs to distinguish whether it is set in the user configuration file or not, so loading a default value for it can't be completely transparent. I'm going to play around with the idea anyway and see how far I get.

flyinghyrax avatar Feb 18 '24 21:02 flyinghyrax

I do not know if it is preferable to review documentation updates separately from code changes, or to put them together in one PR.

Let's do the docs first but maybe

I maybe got a bit carried away rewriting/reorganizing and it may be too big of a change relative to the initial goal here.

All good to me - but let's just have it document reality and change the docs to state when we make the default.conf actually be the source of truth for all default options

I've also commented on the PR suggestions for some enhancements.

root_uri - I think we leave it required in the config maybe as I think it's good to make people create their own config file ...

  • I think directory (for a filesystem mirror) needs to me mandatory too

Thanks again.

cooperlees avatar Feb 20 '24 02:02 cooperlees

All good to me - but let's just have it document reality and change the docs to state when we make the default.conf actually be the source of truth for all default options

This makes sense! That will leave the docs "correct" in the interim while I figure out the configuration implementation bits.

I will go ahead and change the docs PR so the options that are currently missing defaults are marked as required.

flyinghyrax avatar Feb 24 '24 17:02 flyinghyrax

@cooperlees - I've been fiddling around with config validation for a bit and I'm struggling some with how diff-file works.

  1. It looks like it should support diff-file = {{section_option}}/newstuff.txt (unittest.conf, test_storage_plugins.py), but the implementation breaks if there is anything in the value other than the reference, e.g. diff-file = {{section_option}} works but diff-file = {{section_option}}/mirrored-files doesn't. It follows the 'Invalid section reference' path and defaults to '(directory)/mirrored-files'. (This makes it look like it is working if the value happens to be {{mirror_directory}}/mirrored-files.)
  2. The mirror subcommand mirror.py:mirror seems to expect diff_file is optional and has "if diff_file" checks to not write the diff list if there's no path set. But validate_config_values sets diff_file to the empty string, and in the mirror function has diff_file = storage_plugin.PATH_BACKEND(config_values.diff_file_path) before any of the conditional checks, and Path("") evaluates to the same as Path(".") and is "truthy". So it seems like in practice a diff file is always written.

For both of these it is straightforward enough to "fix" them, but doing so changes externally observable behavior in a potentially incompatible way:

  • Although (1) has fallen out of the documentation, I'm imagining someone somewhere is using it and even using the buggy (?) behavior (https://xkcd.com/1172/). I can maintain the behavior exactly as is, fix the parsing/interpolation (I've written and tested an alternate version), or it could be probably be removed since configparser has ExtendedInterpolation that satisfies the use case.
  • The current behavior for (2) is pretty harmless, basically that users who don't have a diff-file option set are getting one anyway, but the file can just be ignored. On the other hand changing it so diff-file is actually optional and isn't written if there's no path set would break things for anyone relying on the current behavior. I could also update the code to expect that there will always be a diff-file path, making the current behavior "official", so to speak.

Thoughts on a direction to take?

(Also should I open a new issue for this config implementation stuff?)

flyinghyrax avatar Mar 23 '24 21:03 flyinghyrax

Thoughts on a direction to take?

I prefer to just fix the magic file appearing and mark it loudly in change log. If anyone is relying on that, I feel it's better to break them and make them be explicit in their config file on their needs.

TIL that Path("") == Path(".") and thus truthy. "" is false, so yeah. Fun. I bet this was once a string before I pathlib'ed everything.

Thanks for noting these found bugs and asking questions on fun cases like this.

cooperlees avatar Mar 25 '24 20:03 cooperlees

Happy to! 👍

And yeah the Path(“”) thing… I didn’t know either, had to try it in a REPL a few different ways to convince myself. 😆

flyinghyrax avatar Mar 25 '24 22:03 flyinghyrax