bandersnatch icon indicating copy to clipboard operation
bandersnatch copied to clipboard

'prerelease_release' plugin causes a KeyError in package metadata verify

Open RWoodring79 opened this issue 3 years ago • 5 comments

I am unable to perform a "verify" on my package mirror if I have the prerelease_release plugin enabled in my bandersnatch.conf file. If I remove this plugin, it works fine.

(venv) root@mirror:/mnt/pypi# bandersnatch --config ./bandersnatch.conf verify
2021-06-11 08:18:55,619 INFO: Starting verify for /mnt/pypi/pypi with 3 workers (verify.py:242)
2021-06-11 08:18:55,792 INFO: Parsing wix-protos-wixerd-api-gateway-server-api (verify.py:128)
2021-06-11 08:18:55,885 INFO: Initialized prerelease plugin with [re.compile('.+rc\\d+$'), re.compile('.+a(lpha)?\\d+$'), re.compile('.+b(eta)?\\d+$'), re.compile('.+dev\\d+$')] (prerelease_name.py:33)
Traceback (most recent call last):
  File "/mnt/pypi/venv/bin/bandersnatch", line 8, in <module>
    sys.exit(main())
  File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/main.py", line 213, in main
    return asyncio.run(async_main(args, config))
  File "/usr/lib/python3.8/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/main.py", line 127, in async_main
    return await bandersnatch.verify.metadata_verify(config, args)
  File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/verify.py", line 259, in metadata_verify
    await verify_producer(
  File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/verify.py", line 222, in verify_producer
    await asyncio.gather(
  File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/verify.py", line 212, in consume
    await verify(
  File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch/verify.py", line 155, in verify
    plugin.filter(pkg)
  File "/mnt/pypi/venv/lib/python3.8/site-packages/bandersnatch_filter_plugins/prerelease_name.py", line 39, in filter
    version = metadata["version"]
KeyError: 'version'

This is an except from my config file. If I comment out just the prerelease_release line, the verify operation starts as expected.

[plugins]
enabled =
    blocklist_project
    prerelease_release
    regex_project

This issue may be similar to https://github.com/pypa/bandersnatch/issues/505 as they are both failing because the metadata dictionary does not contain the key they are expecting.

RWoodring79 avatar Jun 11 '21 12:06 RWoodring79

Thanks for reporting. I guess we just have to check if it exists. I would have expected it to always. Is there a specific package you hit or random that cause they KeyError?

In any case, I guess adding the check and then deciding if we skip that version / part of the metadata we're in needs to be identified and correct behavior we deem to be performed (either ignore and move on or error bad metadata etc.)

cooperlees avatar Jun 11 '21 14:06 cooperlees

I can confirm this bug is present and can be reproduced via the docker image, here is the log from my mirror:

2021-06-28 09:59:18,311 INFO: Starting verify for /data/mirror with 3 workers (verify.py:242)
2021-06-28 09:59:26,091 INFO: Parsing 0 (verify.py:128)
2021-06-28 09:59:26,213 INFO: Initialized latest releases plugin with keep=3 (latest_name.py:34)
2021-06-28 09:59:26,220 INFO: Initialized prerelease plugin with [re.compile('.+rc\\d+$'), re.compile('.+a(lpha)?\\d+$'), re.compile('.+b(eta)?\\d+$'), re.compile('.+dev\\d+$')] (prerelease_name.py:33)
Traceback (most recent call last):
  File "/usr/local/bin/bandersnatch", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/bandersnatch/main.py", line 219, in main
    return asyncio.run(async_main(args, config))
  File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.9/site-packages/bandersnatch/main.py", line 157, in async_main
    return await bandersnatch.verify.metadata_verify(config, args)
  File "/usr/local/lib/python3.9/site-packages/bandersnatch/verify.py", line 259, in metadata_verify
    await verify_producer(
  File "/usr/local/lib/python3.9/site-packages/bandersnatch/verify.py", line 222, in verify_producer
    await asyncio.gather(
  File "/usr/local/lib/python3.9/site-packages/bandersnatch/verify.py", line 212, in consume
    await verify(
  File "/usr/local/lib/python3.9/site-packages/bandersnatch/verify.py", line 155, in verify
    plugin.filter(pkg)
  File "/usr/local/lib/python3.9/site-packages/bandersnatch_filter_plugins/latest_name.py", line 43, in filter
    version: str = metadata["version"]
KeyError: 'version'

It appears the package called "0" is causing the problems in my case, it's the same package name each time I try to re-run the verify so it does not appear to be random

RootLUG avatar Jun 28 '21 10:06 RootLUG

Ok, I finally have a working development environment set up. As someone who has never done python packages before, a DEVELOPERS.md file with some tips like "you have to run pip install --editable . after checking out the code" would have been really helpful. I know, you would be happy to accept a pull request... :)

It appears the PreReleaseFilter.filter is looking for a "version" entry in the metadata dictionary created from the web/json/{pkg name}.json file inthe wrong place. The filter is trying to read metadata["version"], but in my testing the version number is one layer deeper at metadata["info"]["version"]. Is it possible the format of this json file changed since the plugin was developed?

RWoodring79 avatar Jul 22 '21 20:07 RWoodring79

I doubt it has changed, but the ["version"] is a legacy field that must not be always filled. I would say we should:

  • Try to getattr("version", metadata)
  • If that fails, getattr("version", metadata["info"])

Or some kind of fallback logic like that for the plugin.

As for development instructions - We should update https://bandersnatch.readthedocs.io/en/latest/CONTRIBUTING.html#development-venv Thanks for noticing.

cooperlees avatar Jul 23 '21 15:07 cooperlees

After some more digging, my first theory about the version entry moving was wrong. Making the change I initially proposed will break the mirror functionality. When performing a mirror, process_package calls package.filter_all_releases(self.filters.filter_release_plugins()) but when performing a verify, the call to plugin.filter(pkg) does not include the package.filter_all_releases(...). That is important because filter_all_releases creates a new dictionary for each entry in the "releases" lists, and that dictionary (not the one read directly from JSON file) is the one with the "version" entry.

RWoodring79 avatar Jul 26 '21 02:07 RWoodring79