bandersnatch icon indicating copy to clipboard operation
bandersnatch copied to clipboard

Add subcmd to use metadata to roughly calculate the size of the local bandersnatch mirror

Open leochen12-rgb opened this issue 2 years ago • 3 comments

At present, I can obtain the official directory size of pypi(https://pypi.org/stats/), while I am synchronizing the pypi directory. However, the du or duc command takes too long to count. Is there a more convenient way to do this?

leochen12-rgb avatar Dec 11 '22 13:12 leochen12-rgb

Howdy,

This isn't really a bandersnatch question. This is all a limitation of lots of small files on your storage backend.

The only ideas we could possibly try:

  • Use the JSON metadata in parallel and check if a simple dir exists and if so just sum up all the packages
    • Many bugs, but if you use filtering, that won't be applied
  • Use the JSON metadata in parallel and check if the files exist, but I think this will be just as expensive as du (but not sure all the operations du does under the covers)

Another hack I've generally recommended is making a dedicated partition or volume for each part of bandersnatch's storage - e.g. simple and packages directories to be in their own filesystems and then df -h can give quicker insight too.

  • If you use hash-index = true you could also create a volume/file system per shard to get further insight

I don't have the cycles to look into these ideas, but would take a PR add docs or a bandersnatch du like command that works out the sizes quicker if possible. But I feel we'd need to use a lower level language than python to get true speed here. Will leave open incase someone smarter comes along with better ideas.

cooperlees avatar Dec 11 '22 18:12 cooperlees

Thank you for your reply, and look forward to adding the du parameter to bandersmatch.

leochen12-rgb avatar Dec 12 '22 02:12 leochen12-rgb

Awesome. Yeah I’ll be surprised if it’s much faster and will be hard to get accurate without checking if the files exist, which is the expensive part. It might surprise us and be much quicker than du …

cooperlees avatar Dec 12 '22 03:12 cooperlees