package-managers
package-managers copied to clipboard
Generate a breakdown of average linux package manager directory structure and sizes
For @dirkmc's work on https://github.com/ipfs/package-managers/issues/77 it'd be very helpful to get an idea of the kind of sizes and shapes of the directory structures of various linux package manager repositories.
This likely involves rsyncing copies of https://github.com/ipfs/package-managers/issues/75 and inspecting them (du
may be useful here), producing a histogram of output for each one and possibly an average of all of them.
This will allow us to build some scripts that can generate repo-like directory structures without needing to download 1TB+ of real data.
If the directory you'd like to analyze is called arch
, then here are a collection of handy commands for generating a breakdown of sizes and structures:
breakdown of file sizes
cmd: find arch -type f -print0 | xargs -0 ls -l | awk '{ n=int(log($5)/log(2)); if (n<10) { n=10; } size[n]++ } END { for (i in size) printf("%d %d\n", 2^i, size[i]) }' | sort -n | awk 'function human(x) { x[1]/=1024; if (x[1]>=1024) { x[2]++; human(x) } } { a[1]=$1; a[2]=0; human(a); printf("%3d%s: %6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }'
result:
1k: 11110
2k: 60
4k: 359
8k: 1042
16k: 1235
32k: 1374
64k: 1229
128k: 1260
256k: 1247
512k: 916
1M: 705
2M: 631
4M: 374
8M: 218
16M: 197
32M: 114
64M: 69
128M: 33
256M: 17
512M: 10
1G: 2
2G: 1
breakdown of file sizes as .csv file
cmd: find arch -type f -print0 | xargs -0 ls -l | awk '{ n=int(log($5)/log(2)); if (n<10) { n=10; } size[n]++ } END { for (i in size) printf("%d %d\n", 2^i, size[i]) }' | sort -n | awk 'function human(x) { x[1]/=1024; if (x[1]>=1024) { x[2]++; human(x) } } { a[1]=$1; a[2]=0; human(a); printf("%3d%s,%6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }' > sizes.csv
result: a file called sizes.csv
which you can visualize in the terminal with: perl -lane 'print $F[0], "\t", "=" x ($F[1] / 15)' sizes.csv
1k, =======================================================
2k,
4k, =
8k, =====
16k, ======
32k, ======
64k, ======
128k, ======
256k, ======
512k, ====
1M, ===
2M, ===
4M, =
8M, =
16M,
32M,
64M,
128M,
256M,
512M,
1G,
2G,
breakdown of file extensions
cmd: find arch -type f | sed -n 's/..*\.//p' | sort | uniq -c | sort -r
result:
11021 sig
11002 xz
51 txt
46 gz
24 old
17 img
9 torrent
9 iso
6 LICENSE
3 sha512
3 sfs
2 01/arch/boot/x86_64/vmlinuz
1 12/boot/vmlinuz_x86_64
1 12/boot/vmlinuz_i686
1 08/boot/vmlinuz_x86_64
1 08/boot/vmlinuz_i686
1 06/boot/vmlinuz_x86_64
1 02/arch/boot/x86_64/vmlinuz
note: For the next two, you may need to brew install tree
on a mac.
total directories and files
cmd: tree arch/ | tail -1
result: 64 directories, 22203 files
simplified tree view
cmd: tree -d arch/
result:
arch/
├── community
│ └── os
│ └── x86_64
│ └── local
├── community-staging
│ └── os
│ └── x86_64
├── community-testing
│ └── os
│ └── x86_64
├── core
│ └── os
│ └── x86_64
├── extra
│ └── os
│ └── x86_64
├── gnome-unstable
│ └── os
│ └── x86_64
├── iso
│ ├── 2019.05.02
│ │ └── arch
│ │ ├── boot
│ │ │ └── x86_64
│ │ └── x86_64
│ ├── 2019.06.01
│ │ └── arch
│ │ ├── boot
│ │ │ └── x86_64
│ │ └── x86_64
│ ├── 2019.07.01
│ │ └── arch
│ │ ├── boot
│ │ │ └── x86_64
│ │ └── x86_64
│ └── archboot
│ ├── 2016.08
│ │ └── boot
│ ├── 2016.12
│ │ └── boot
│ ├── 2018.06
│ │ └── boot
│ └── history
├── kde-unstable
│ └── os
│ └── x86_64
├── multilib
│ └── os
│ └── x86_64
├── multilib-staging
│ └── os
│ └── x86_64
├── multilib-testing
│ └── os
│ └── x86_64
├── pool
│ ├── community
│ └── packages
├── staging
│ └── os
│ └── x86_64
└── testing
└── os
└── x86_64
64 directories
dutree is also a very nice tool for visualizing the relative sizes of nested directories in the terminal, if you have rust installed, grab it with cargo install dutree
tree view with size breakdown
cmd: dutree -d2 arch
result:
[ arch 47.50 GiB ]
├─ pool │ █████████████████████████████████████████████████│ 80% 38.45 GiB
│ ├─ community │ ░░░░░░░░░░░░█████████████████████████████████████│ 76% 29.55 GiB
│ └─ packages │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░██████████│ 23% 8.89 GiB
├─ iso │ ██████████│ 18% 8.97 GiB
│ ├─ archboot │ ░░░░░█████│ 57% 5.16 GiB
│ ├─ 2019.07.01 │ ░░░░░░░░░░│ 14% 1.28 GiB
│ ├─ 2019.06.01 │ ░░░░░░░░░░│ 14% 1.27 GiB
│ └─ 2019.05.02 │ ░░░░░░░░░░│ 14% 1.26 GiB
├─ community │ │ 0% 51.92 MiB
│ └─ os │ │ 99% 51.92 MiB
├─ extra │ │ 0% 22.69 MiB
│ └─ os │ │ 99% 22.69 MiB
├─ core │ │ 0% 2.41 MiB
│ └─ os │ │ 99% 2.41 MiB
├─ kde-unstable │ │ 0% 1.06 MiB
│ └─ os │ │ 99% 1.06 MiB
├─ multilib │ │ 0% 1.05 MiB
│ └─ os │ │ 99% 1.05 MiB
├─ staging │ │ 0% 449.09 KiB
│ └─ os │ │ 99% 449.00 KiB
├─ testing │ │ 0% 350.96 KiB
│ └─ os │ │ 99% 350.86 KiB
├─ community-testing │ │ 0% 270.26 KiB
│ └─ os │ │ 99% 270.16 KiB
├─ gnome-unstable │ │ 0% 183.86 KiB
│ └─ os │ │ 99% 183.76 KiB
├─ community-staging │ │ 0% 59.60 KiB
│ └─ os │ │ 99% 59.51 KiB
├─ multilib-testing │ │ 0% 10.89 KiB
│ └─ os │ │ 99% 10.79 KiB
├─ multilib-staging │ │ 0% 6.62 KiB
│ └─ os │ │ 98% 6.52 KiB
├─ lastsync │ │ 0% 11 B
└─ lastupdate │ │ 0% 11 B
You can also import that csv into excel/numbers and make some pretty graphs:
Super helpful, thanks 👍
Updated with real data now that rsync has finished, also uploaded the csv file here: https://gist.github.com/andrew/3ca196c9aa464a9a35d23e669d6e70bd
This is ready to close, but MHz is going to think about whether it makes sense to document this somewhere else as well. And/or, turn this into -- or follow-on task -- now we should run these commands on other repos (see #75 for list).