bandersnatch icon indicating copy to clipboard operation
bandersnatch copied to clipboard

bandersnatch `size_project_metadata` plugin casuses some packages to not sync - e.g. pip + falcon

Open J-Phi1123 opened this issue 1 year ago • 10 comments

I want to begin with, I'm pretty sure this is a user error thing but can't figure out what I'm doing wrong on this. It is not obvious whatever is causing it and bandersnatch is not very helpful in identifying the issue. Thanks in advance for any support to fix this. I have been screwing with this for over 2 weeks now and almost done with all of this.

I am trying to create a complete offline pip repo and it seems like it is working but of course, out of thousands of packages that are online, two are not being updated; specifically pip, and falcon

I see '^pip\ " and "^falcon\ " names and many other files in the "todo" file after bandersnatch mirror --force-check runs.

If I try to run bandersnatch sync falcon falcon is still not present in pip/pypi/web/simple/falcon

I recently turned on json = true and reran bandersnatch mirror --force-check it created the json folder which does not contain the falcon or pip file?

I am currently running bandersnatch verify now that I have a json folder which I guess will take a few days to finish so unfortunately I can't run bandersnatch sync --debug falcon. From my memory the only thing that seemed different while running it with --debug is it mention filter rules; filter and file filter. Definitely nothing about how it couldn't download anything. It seems to think the files were already downloaded?

Specs: bandersnatch 5.2.0 OS: ubuntu 20.04 syncing to external ext4 drive

Config: ''' [plugins] enabled = size_project_metadata [size_project_metadata] max_package_size = 100M [mirror] directory = /media/user/ExternalEXT4/pip/pypi json = true release-files = true cleanup = false master = https://pypi.org timeout = 10 global-timeout = 1800 workers = 3 hash-index = false stop-on-error = false storage-backend = filesystem verifiers = 3 compare-method = hash diff-file = /media/user/ExternalEXT4/pip/pypi/mirrored-files '''

J-Phi1123 avatar Aug 05 '22 17:08 J-Phi1123

Will try and look into this over the weekend and see if I can reproduce ...

cooperlees avatar Aug 05 '22 18:08 cooperlees

Thanks for the help.

J-Phi1123 avatar Aug 05 '22 18:08 J-Phi1123

So I was able to repro with using the size_project_metadata plugin ... So the bug is in there ...

Debug run with plugin enabled:

crl-m1:~ cooper$ /tmp/tb/bin/bandersnatch -c /tmp/pypi/bandersnatch.conf --debug sync falcon 2>&1 | tee /tmp/bander_sync_falcon_debug
2022-08-06 18:50:11,894 DEBUG: Checking config for storage backend... (configuration.py:121)
2022-08-06 18:50:11,894 DEBUG: Found storage backend in config! (configuration.py:123)
2022-08-06 18:50:11,895 INFO: Selected storage backend: filesystem (configuration.py:129)
2022-08-06 18:50:11,895 DEBUG: Checking config for compare method... (configuration.py:161)
2022-08-06 18:50:11,895 DEBUG: Found compare method in config! (configuration.py:163)
2022-08-06 18:50:11,895 INFO: Selected compare method: hash (configuration.py:175)
2022-08-06 18:50:11,895 DEBUG: Checking config for alternative download mirror... (configuration.py:178)
2022-08-06 18:50:11,895 DEBUG: No alternative download mirror found in config. (configuration.py:183)
2022-08-06 18:50:11,895 DEBUG: Skip checking download-mirror-no-fallback because dependent optionis not set in config. (configuration.py:203)
2022-08-06 18:50:11,950 DEBUG: Initializing Master's aiohttp ClientSession (master.py:79)
2022-08-06 18:50:11,977 INFO: Initialized metadata plugin size_project_metadata to block projects > 104857600 bytes (metadata_filter.py:232)
2022-08-06 18:50:11,983 DEBUG: Adding json directories to bootstrap (mirror.py:536)
2022-08-06 18:50:11,983 INFO: Setting up mirror directory: /tmp/pypi/web/simple (mirror.py:546)
2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/packages (mirror.py:546)
2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/local-stats/days (mirror.py:546)
2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/json (mirror.py:546)
2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/pypi (mirror.py:546)
2022-08-06 18:50:11,984 DEBUG: Retrieving FileLock instance @ /tmp/pypi/.lock (filesystem.py:36)
2022-08-06 18:50:11,984 DEBUG: Acquiring FLock with timeout: 1 (mirror.py:551)
2022-08-06 18:50:11,984 INFO: Generation file missing. Reinitialising status files. (mirror.py:586)
2022-08-06 18:50:11,985 DEBUG: Modifying destination: /tmp/pypi/generation with: /tmp/pypi/generation.m6ggg53h (filesystem.py:122)
2022-08-06 18:50:11,985 INFO: Status file /tmp/pypi/status missing. Starting over. (mirror.py:608)
2022-08-06 18:50:11,985 INFO: Syncing with https://pypi.org. (mirror.py:59)
2022-08-06 18:50:11,985 INFO: No release filters are enabled. Skipping release filtering (mirror.py:80)
2022-08-06 18:50:11,985 INFO: No release file filters are enabled. Skipping release file filtering (mirror.py:82)
2022-08-06 18:50:11,985 DEBUG: Package syncer 0 started for duty (mirror.py:127)
2022-08-06 18:50:11,985 INFO: Fetching metadata for package: falcon (serial 0) (package.py:58)
2022-08-06 18:50:11,985 DEBUG: Getting /pypi/falcon/json (serial 0) (master.py:146)
2022-08-06 18:50:12,005 DEBUG: Package syncer 1 started for duty (mirror.py:127)
2022-08-06 18:50:12,005 DEBUG: Package syncer 1 emptied queue (mirror.py:134)
2022-08-06 18:50:12,005 DEBUG: Package syncer 2 started for duty (mirror.py:127)
2022-08-06 18:50:12,005 DEBUG: Package syncer 2 emptied queue (mirror.py:134)
2022-08-06 18:50:12,307 DEBUG: Package syncer 0 emptied queue (mirror.py:134)
2022-08-06 18:50:12,307 INFO: Generating global index page. (mirror.py:486)
2022-08-06 18:50:12,308 DEBUG: Writing temporary file /tmp/pypi/web/simple/.index.html.x64odmpl to target destination: /tmp/pypi/web/simple/index.html (filesystem.py:93)
2022-08-06 18:50:12,308 DEBUG: Closing Master's aiohttp ClientSession and waiting 0.1 seconds (master.py:99)
2022-08-06 18:50:12,410 INFO: 0 packages had changes (mirror.py:1051)
2022-08-06 18:50:12,410 INFO: Writing diff file to /tmp/pypi/mirrored-files (mirror.py:1061)

So I disabled the plugin falcon downloaded fine. cmd: /tmp/tb/bin/bandersnatch -c /tmp/pypi/bandersnatch.conf --debug sync falcon

Full repro commands

mkdir /tmp/pypi
vim /tmp/pypi/bandersnatch.conf
- Changed dirs to be based out of /tmp/pypi
python3.10 -m venv /tmp/tb --upgrade-deps
/tmp/tb/bin/pip install bandersnatch==5.2.0

So we'd need to add more debugging info into the plugin code + plugin calling code to see what exactly is making it skip this package as a whole. Fixes welcome, I'm low on time to dig in and fix this plugin. As plugins are optional, I generally rely on contributions for them. I focus more on making core bandersnatch function (as I don't use bandersnatch + haven't for years and would really love to get a new maintainer)

cooperlees avatar Aug 07 '22 01:08 cooperlees

Thank you so much for the help guys. I am going to switch gears and put the larger packages from the pystats in my config and go from there. Thanks you so much for the support.

God bless open source!

On Sat, Aug 6, 2022, 9:57 PM Cooper Lees @.***> wrote:

So I was able to repro with using the size_project_metadata plugin ... So the bug is in there ...

Debug run with plugin enabled:

crl-m1:~ cooper$ /tmp/tb/bin/bandersnatch -c /tmp/pypi/bandersnatch.conf --debug sync falcon 2>&1 | tee /tmp/bander_sync_falcon_debug 2022-08-06 18:50:11,894 DEBUG: Checking config for storage backend... (configuration.py:121) 2022-08-06 18:50:11,894 DEBUG: Found storage backend in config! (configuration.py:123) 2022-08-06 18:50:11,895 INFO: Selected storage backend: filesystem (configuration.py:129) 2022-08-06 18:50:11,895 DEBUG: Checking config for compare method... (configuration.py:161) 2022-08-06 18:50:11,895 DEBUG: Found compare method in config! (configuration.py:163) 2022-08-06 18:50:11,895 INFO: Selected compare method: hash (configuration.py:175) 2022-08-06 18:50:11,895 DEBUG: Checking config for alternative download mirror... (configuration.py:178) 2022-08-06 18:50:11,895 DEBUG: No alternative download mirror found in config. (configuration.py:183) 2022-08-06 18:50:11,895 DEBUG: Skip checking download-mirror-no-fallback because dependent optionis not set in config. (configuration.py:203) 2022-08-06 18:50:11,950 DEBUG: Initializing Master's aiohttp ClientSession (master.py:79) 2022-08-06 18:50:11,977 INFO: Initialized metadata plugin size_project_metadata to block projects > 104857600 bytes (metadata_filter.py:232) 2022-08-06 18:50:11,983 DEBUG: Adding json directories to bootstrap (mirror.py:536) 2022-08-06 18:50:11,983 INFO: Setting up mirror directory: /tmp/pypi/web/simple (mirror.py:546) 2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/packages (mirror.py:546) 2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/local-stats/days (mirror.py:546) 2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/json (mirror.py:546) 2022-08-06 18:50:11,984 INFO: Setting up mirror directory: /tmp/pypi/web/pypi (mirror.py:546) 2022-08-06 18:50:11,984 DEBUG: Retrieving FileLock instance @ /tmp/pypi/.lock (filesystem.py:36) 2022-08-06 18:50:11,984 DEBUG: Acquiring FLock with timeout: 1 (mirror.py:551) 2022-08-06 18:50:11,984 INFO: Generation file missing. Reinitialising status files. (mirror.py:586) 2022-08-06 18:50:11,985 DEBUG: Modifying destination: /tmp/pypi/generation with: /tmp/pypi/generation.m6ggg53h (filesystem.py:122) 2022-08-06 18:50:11,985 INFO: Status file /tmp/pypi/status missing. Starting over. (mirror.py:608) 2022-08-06 18:50:11,985 INFO: Syncing with https://pypi.org. (mirror.py:59) 2022-08-06 18:50:11,985 INFO: No release filters are enabled. Skipping release filtering (mirror.py:80) 2022-08-06 18:50:11,985 INFO: No release file filters are enabled. Skipping release file filtering (mirror.py:82) 2022-08-06 18:50:11,985 DEBUG: Package syncer 0 started for duty (mirror.py:127) 2022-08-06 18:50:11,985 INFO: Fetching metadata for package: falcon (serial 0) (package.py:58) 2022-08-06 18:50:11,985 DEBUG: Getting /pypi/falcon/json (serial 0) (master.py:146) 2022-08-06 18:50:12,005 DEBUG: Package syncer 1 started for duty (mirror.py:127) 2022-08-06 18:50:12,005 DEBUG: Package syncer 1 emptied queue (mirror.py:134) 2022-08-06 18:50:12,005 DEBUG: Package syncer 2 started for duty (mirror.py:127) 2022-08-06 18:50:12,005 DEBUG: Package syncer 2 emptied queue (mirror.py:134) 2022-08-06 18:50:12,307 DEBUG: Package syncer 0 emptied queue (mirror.py:134) 2022-08-06 18:50:12,307 INFO: Generating global index page. (mirror.py:486) 2022-08-06 18:50:12,308 DEBUG: Writing temporary file /tmp/pypi/web/simple/.index.html.x64odmpl to target destination: /tmp/pypi/web/simple/index.html (filesystem.py:93) 2022-08-06 18:50:12,308 DEBUG: Closing Master's aiohttp ClientSession and waiting 0.1 seconds (master.py:99) 2022-08-06 18:50:12,410 INFO: 0 packages had changes (mirror.py:1051) 2022-08-06 18:50:12,410 INFO: Writing diff file to /tmp/pypi/mirrored-files (mirror.py:1061)

So I disabled the plugin falcon downloaded fine. cmd: /tmp/tb/bin/bandersnatch -c /tmp/pypi/bandersnatch.conf --debug sync falcon Full repro commands

mkdir /tmp/pypi vim /tmp/pypi/bandersnatch.conf

  • Changed dirs to be based out of /tmp/pypi python3.10 -m venv /tmp/tb --upgrade-deps /tmp/tb/bin/pip install bandersnatch==5.2.0

So we'd need to add more debugging info into the plugin code + plugin calling code to see what exactly is making it skip this package as a whole. Fixes welcome, I'm low on time to dig in and fix this plugin. As plugins are optional, I generally rely on contributions for them. I focus more on making core bandersnatch function (as I don't use bandersnatch + haven't for years and would really love to get a new maintainer)

— Reply to this email directly, view it on GitHub https://github.com/pypa/bandersnatch/issues/1169#issuecomment-1207314074, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYQTLQKZJPSIASHFPAWGILVX4JZNANCNFSM55WZWLUQ . You are receiving this because you authored the thread.Message ID: @.***>

J-Phi1123 avatar Aug 07 '22 03:08 J-Phi1123

Took your advice and used the pypistats tool to generate a list of large projects. Seems to be working great.

It would seem that that plugin looks at the size of all files in the package and if sum of all bytes of all versions in a package is greater than what you specify; it doesn't grab any of them. I was thinking it was .whl | .tar.gz individually because I did see a few pip .whl files that were >100MB so I used that number thinking it was a sane Maximum. The plugin recommended 1GB but even that been blocking the packages I did want.

Anyways, Thanks again for the help and keep up the good work. You guys Rock!

J-Phi1123 avatar Aug 09 '22 00:08 J-Phi1123

Thanks for digging in and explaining why things happened.

cooperlees avatar Aug 09 '22 01:08 cooperlees

I think we should advertise that we'd love a fix for the size_project_metadata plugin + it's a known issue.

cooperlees avatar Aug 16 '22 16:08 cooperlees

Is there an error? It does what it says. I just misunderstood the way it worked. I was thinking individual file sizes instead of entire project size.

On Tue, Aug 16, 2022, 12:01 PM Cooper Lees @.***> wrote:

I think we should advertise that we'd love a fix for the size_project_metadata plugin + it's a known issue.

— Reply to this email directly, view it on GitHub https://github.com/pypa/bandersnatch/issues/1169#issuecomment-1216839271, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYQTLTTEUGUJ27UWKZHXS3VZO3NTANCNFSM55WZWLUQ . You are receiving this because you modified the open/close state.Message ID: @.***>

J-Phi1123 avatar Aug 16 '22 16:08 J-Phi1123

O, so it SUMs() the whole project. I'll check if I can make documentation clearer than :) Cause I didn't get that from reading it either or missed it. Thanks for clearing that up too.

cooperlees avatar Aug 16 '22 16:08 cooperlees

Yeah definitely since you thought the same as I did and you are one of the contributers.

Though having a plugin to blacklist/whitelist individual file sizes would be handy though.

On Tue, Aug 16, 2022, 12:28 PM Cooper Lees @.***> wrote:

O, so it SUMs() the whole project. I'll check if I can make documentation clearer than :) Cause I didn't get that from reading it either or missed it. Thanks for clearing that up too.

— Reply to this email directly, view it on GitHub https://github.com/pypa/bandersnatch/issues/1169#issuecomment-1216871503, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYQTLXSJOHYYDSSFLVMST3VZO6T5ANCNFSM55WZWLUQ . You are receiving this because you modified the open/close state.Message ID: @.***>

J-Phi1123 avatar Aug 16 '22 17:08 J-Phi1123