forest
forest copied to clipboard
Don't deploy Rust docs to GH pages
Issue summary
Self-hosting Rust docs makes regular Forest clones/forks quite heavy; 419 MB for latest main b7ac40354c713ef2e087a6e4c1f0c59c61a28bef.
The largest culprit is the Rust documentation.
Use this command
git rev-list --objects --all | git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | sed -n 's/^blob //p' | sort --numeric-sort --key=2 | cut -c 1-12,41- | $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest | col
Sample output:
...
2e6ab4cd5af 4.8MiB rustdoc/implementors/core/panic/unwind_safe/trait.RefUnwindSafe.js
fcc469f254c8 4.8MiB rustdoc/implementors/core/panic/unwind_safe/trait.RefUnwindSafe.js
3a3a46cdb2c0 4.8MiB rustdoc/implementors/core/panic/unwind_safe/trait.RefUnwindSafe.js
46c9f562f2cc 4.8MiB rustdoc/implementors/core/panic/unwind_safe/trait.RefUnwindSafe.js
003aba4fb25f 4.8MiB rustdoc/implementors/core/panic/unwind_safe/trait.RefUnwindSafe.js
5baeacce618e 4.8MiB rustdoc/implementors/core/panic/unwind_safe/trait.RefUnwindSafe.js
...
df90f0b30cc4 16MiB rustdoc/search-index.js
e8500f3b8af1 16MiB rustdoc/search-index.js
05e123abbb51 16MiB rustdoc/search-index.js
6cebbd16bff8 16MiB rustdoc/search-index.js
a74c4b64e5c4 16MiB rustdoc/search-index.js
c830e0d1c911 16MiB rustdoc/search-index.js
4cb635118402 16MiB rustdoc/search-index.js
ba3b8b29d3d9 16MiB rustdoc/search-index.js
31f710c25df8 17MiB rustdoc/search-index.js
This is too large and must be dealt with.
- [ ] Remove deploying Rust docs to GH pages
- [ ] Cleanup the
gh-pagesbranch - most likely it will need to be squashed (or re-created) so that the objects are no longer present in git history, - [ ] Verify that the clone/fork size is significantly smaller than it was before.
Other information and links
Other big rust projects must face this issue - I'll have a look at how e.g servo addresses this
See also https://github.com/rust-lang/rust/issues/31387
Known issue: https://www.reddit.com/r/rust/comments/wy3j50/psa_if_youre_using_ghpages_to_host_your/
As a first-step measure, we changed the CI script to overwrite the gh-pages branch at every run, rather than just appending a new commit. We use this gh-pages action, so it was just a matter of adding a force_orphan: true parameter.
Turns out that this improved the situation a lot: when it no longer needs to keep history, git manages to compress that 220MB of documentation very well, and now the whole Smithay git repo is only ~15 MB!
We currently use JamesIves/github-pages-deploy-action, which looks like it commits to a gh-pages branch?
https://github.com/ChainSafe/forest/blob/b7ac40354c713ef2e087a6e4c1f0c59c61a28bef/.github/workflows/docs.yml#L81-L87
Maybe switching to actions/deploy-pages would be better, or maybe the fork mentioned in the reddit thread
I would suggest exploring using Cloudflare pages for this, as well as pruning the git history. Cloning the full repo sucks 😁