bandersnatch
bandersnatch copied to clipboard
'prerelease_release' plugin causes a KeyError in package metadata verify
I am unable to perform a "verify" on my package mirror if I have the prerelease_release
plugin enabled in my bandersnatch.conf file. If I remove this plugin, it works fine.
(venv) root@mirror:/mnt/pypi# bandersnatch --config ./bandersnatch.conf verify
2021-06-11 08:18:55,619 INFO: Starting verify for /mnt/pypi/pypi with 3 workers (verify.py:242)
2021-06-11 08:18:55,792 INFO: Parsing wix-protos-wixerd-api-gateway-server-api (verify.py:128)
2021-06-11 08:18:55,885 INFO: Initialized prerelease plugin with [re.compile('.+rc\\d+$'), re.compile('.+a(lpha)?\\d+$'), re.compile('.+b(eta)?\\d+$'), re.compile('.+dev\\d+$')] (prerelease_name.py:33)
Traceback (most recent call last):
File "/mnt/pypi/venv/bin/bandersnatch", line 8, in <module>
sys.exit(main())
File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/main.py", line 213, in main
return asyncio.run(async_main(args, config))
File "/usr/lib/python3.8/asyncio/runners.py", line 43, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/main.py", line 127, in async_main
return await bandersnatch.verify.metadata_verify(config, args)
File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/verify.py", line 259, in metadata_verify
await verify_producer(
File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/verify.py", line 222, in verify_producer
await asyncio.gather(
File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/verify.py", line 212, in consume
await verify(
File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/verify.py", line 155, in verify
plugin.filter(pkg)
File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch_filter_plugins/prerelease_name.py", line 39, in filter
version = metadata["version"]
KeyError: 'version'
This is an except from my config file. If I comment out just the prerelease_release
line, the verify operation starts as expected.
[plugins]
enabled =
blocklist_project
prerelease_release
regex_project
This issue may be similar to https://github.com/pypa/bandersnatch/issues/505 as they are both failing because the metadata
dictionary does not contain the key they are expecting.
Thanks for reporting. I guess we just have to check if it exists. I would have expected it to always. Is there a specific package you hit or random that cause they KeyError?
In any case, I guess adding the check and then deciding if we skip that version / part of the metadata we're in needs to be identified and correct behavior we deem to be performed (either ignore and move on or error bad metadata etc.)
I can confirm this bug is present and can be reproduced via the docker image, here is the log from my mirror:
2021-06-28 09:59:18,311 INFO: Starting verify for /data/mirror with 3 workers (verify.py:242)
2021-06-28 09:59:26,091 INFO: Parsing 0 (verify.py:128)
2021-06-28 09:59:26,213 INFO: Initialized latest releases plugin with keep=3 (latest_name.py:34)
2021-06-28 09:59:26,220 INFO: Initialized prerelease plugin with [re.compile('.+rc\\d+$'), re.compile('.+a(lpha)?\\d+$'), re.compile('.+b(eta)?\\d+$'), re.compile('.+dev\\d+$')] (prerelease_name.py:33)
Traceback (most recent call last):
File "/usr/local/bin/bandersnatch", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/bandersnatch/main.py", line 219, in main
return asyncio.run(async_main(args, config))
File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/usr/local/lib/python3.9/site-packages/bandersnatch/main.py", line 157, in async_main
return await bandersnatch.verify.metadata_verify(config, args)
File "/usr/local/lib/python3.9/site-packages/bandersnatch/verify.py", line 259, in metadata_verify
await verify_producer(
File "/usr/local/lib/python3.9/site-packages/bandersnatch/verify.py", line 222, in verify_producer
await asyncio.gather(
File "/usr/local/lib/python3.9/site-packages/bandersnatch/verify.py", line 212, in consume
await verify(
File "/usr/local/lib/python3.9/site-packages/bandersnatch/verify.py", line 155, in verify
plugin.filter(pkg)
File "/usr/local/lib/python3.9/site-packages/bandersnatch_filter_plugins/latest_name.py", line 43, in filter
version: str = metadata["version"]
KeyError: 'version'
It appears the package called "0" is causing the problems in my case, it's the same package name each time I try to re-run the verify so it does not appear to be random
Ok, I finally have a working development environment set up. As someone who has never done python packages before, a DEVELOPERS.md
file with some tips like "you have to run pip install --editable .
after checking out the code" would have been really helpful. I know, you would be happy to accept a pull request... :)
It appears the PreReleaseFilter.filter
is looking for a "version" entry in the metadata
dictionary created from the web/json/{pkg name}.json
file inthe wrong place. The filter is trying to read metadata["version"]
, but in my testing the version number is one layer deeper at metadata["info"]["version"]
. Is it possible the format of this json file changed since the plugin was developed?
I doubt it has changed, but the ["version"] is a legacy field that must not be always filled. I would say we should:
- Try to getattr("version", metadata)
- If that fails, getattr("version", metadata["info"])
Or some kind of fallback logic like that for the plugin.
As for development instructions - We should update https://bandersnatch.readthedocs.io/en/latest/CONTRIBUTING.html#development-venv Thanks for noticing.
After some more digging, my first theory about the version entry moving was wrong. Making the change I initially proposed will break the mirror
functionality. When performing a mirror, process_package
calls package.filter_all_releases(self.filters.filter_release_plugins())
but when performing a verify, the call to plugin.filter(pkg)
does not include the package.filter_all_releases(...)
. That is important because filter_all_releases
creates a new dictionary for each entry in the "releases" lists, and that dictionary (not the one read directly from JSON file) is the one with the "version" entry.