layer5
layer5 copied to clipboard
Large repo size: unwanted .pack files
Current Behavior
The layer5
repo is over 2GB in size due to unwanted .pack files in .git/objects/pack
.
Desired Situation A smaller repo size.
Contributor Resources
The layer5.io website uses Gatsby, React, and GitHub Pages. Site content is found under the master
branch.
- See contributing instructions
- See Layer5 site designs in this Figma project. Join the Layer5 Community for access.
hi @leecalcote can I take this on?
sure @Jordan-Rob, sorry missed the comment.
@leecalcote @warunicorn19 I think those pack files are not committed to the repo. And these files are required as part of git object database, https://git-scm.com/book/en/v2/Git-Internals-Git-Objects
Ref: https://stackoverflow.com/questions/49535201/pack-file-remove-it-in-git
ohh okay, so a big NO NO on deleting the .git/objects/pack
files.
Yep, and I don't think it would matter as .git is not committed to the repo
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Hey @warunicorn19 @adithyaakrishna, can you please check out the https://github.com/18F/C2/issues/439 seems like we can reduce unwanted .pack files.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@adithyaakrishna @warunicorn19 any insights on this? And on @Aju100's approach?
@leecalcote @Chadha93 I want to take up this issue if no one is working on it rn.
All yours @Abhijay007
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue is being automatically closed due to inactivity. However, you may choose to reopen this issue.
After the site is built, node_modules
and .cache
take up some space, but both of these directories are .gitignore
d. The .pack
files in the .git
directory are the culprit.
--- /layer5 -------------------------------------------------------------------------
2.4 GiB [##############] /.git
1.0 GiB [###### ] /node_modules
510.5 MiB [## ] /.cache
367.0 MiB [## ] /src
271.9 MiB [# ] /public
52.2 MiB [ ] /static
1.8 MiB [ ] package-lock.json
428.0 KiB [ ] /.github
324.0 KiB [ ] /.devcontainer
196.0 KiB [ ] /content-learn
20.0 KiB [ ] gatsby-node.js
20.0 KiB [ ] CONTRIBUTING.md
20.0 KiB [ ] /.vscode
16.0 KiB [ ] gatsby-config.js
12.0 KiB [ ] LICENSE
12.0 KiB [ ] README.md
12.0 KiB [ ] /.husky
8.0 KiB [ ] .DS_Store
4.0 KiB [ ] package.json
4.0 KiB [ ] fonts.css
4.0 KiB [ ] .eslintrc.js
4.0 KiB [ ] GOVERNANCE.md
4.0 KiB [ ] .gitignore
4.0 KiB [ ] Makefile
4.0 KiB [ ] root-wrapper.js
4.0 KiB [ ] CODE_OF_CONDUCT.md
4.0 KiB [ ] Makefile.show-help.mk
4.0 KiB [ ] .babelrc
4.0 KiB [ ] .eslintignore
4.0 KiB [ ] gatsby-browser.js
4.0 KiB [ ] script.sh
4.0 KiB [ ] gatsby-ssr.js
4.0 KiB [ ] .gitattributes
4.0 KiB [ ] .env.development
4.0 KiB [ ] CODEOWNERS
4.0 KiB [ ] CNAME
Total disk usage: 4.6 GiB Apparent size: 4.2 GiB Items: 139,693
Hey, @leecalcote, @Chadha93 can I look into this issue ?
@leecalcote I looked through the .pack
files that we have in order to identify exactly what blobs are taking up so much space and as far as I can tell the cause are the assets or the media files that we use such as .png/.jpg/.mp4 files, running git gc --aggressive
helps only a bit, and upon looking for ways to reduce the .pack
files size the most recommended way is to actually get rid of the media entries from the repo and storing them elsewhere.
Thanks for looking into this, @XDRAGON2002. Yes, I agree. The size of the .git
directory in the comment above reinforces this fact.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue is being automatically closed due to inactivity. However, you may choose to reopen this issue.
FYI @randychilau
Hi @leecalcote,
Unfortunately the process to reduce/clean a repo does not seem straightforward or well documented for a public repo like Layer5.
I have outlined what I believe to be the phases required for this task, please let me know if there are any items missing, issues overlooked, or questions. It also seems this will require a fair amount of coordination and proper scheduling to execute, especially for the later phases.
I only have a basic understanding of git, so it would be great to have more experienced users review the information below.
Please include whoever else should be in this discussion.
Cheers, Randy
Note:
-
All of the repo changes can be tested on a clone and uploaded to a new Layer5 test repo for testing/review.
-
Using Git LFS seems to be a best practice for assets (e.g. image, video, zip files).
-
The big question is whether to upload the filtered clone and overwrite the existing repo (complex), or create a new repo to upload to (simple). Also there are logistics required in either case (e.g. issues, pull requests, comments, etc).
-
If you wish to upload a filtered clone to the existing repo, there are many considerations involved as described in the “DISCUSSION” section (points 4, 5, 6) of the
filter-repo
user manual. Here is one of them:
“People who cloned from the original repo will have old history. When they fetch the new history you force pushed up, unless they do a git reset --hard @{u} on their branches or rebase their local work, git will think they have hundreds or thousands of commits with very similar commit messages as what exist upstream (but which include files you wanted excised from history), and allow the user to merge the two histories, resulting in what looks like two copies of each commit. If they then push this history back up, then everyone now has history with two copies of each commit and the bad files have returned. You’re more likely to succeed in forcing people to get rid of the old history if they have to clone a new URL.”
- Here is a glimpse at the potential final result for repo size:
Phase 1: Create a test filtered clone
-
Remove all unused packages (using tools like depcheck, IDE
find
to double-check) and any files fromassets
andstatic
folders that are not being used anymore (e.g. zip files, confirm with maintainers for actual clone) -
Remove untracked files and directories using
git clean
-
move all large files and/or specified file types to Git LFS two methods:
- Using git command
renormalize
- Using BFG-repo-cleaner
- Using git command
-
Utilize the
filter-repo
script which:
“Rapidly rewrite entire repository history using user-specified filters. This is a destructive operation which should not be used lightly; it writes new commits, trees, tags, and blobs corresponding to (but filtered from) the original objects in the repository, then deletes the original history and leaves only the new.”
-- Use `git clone --bare` for [copy of layer5](https://docs.github.com/en/repositories/creating-and-managing-repositories/duplicating-a-repository) and fetch LFS objects
-- Run filter-repo script with `–analyze` flag, sample:
-- Run filter-repo script with
--invert-paths --paths-from-file ./filter-repo/analysis/path-deleted-sizes.txt
- Upload test filtered clone to a created Layer5 test repo.
Phase 2: Review test filtered clone for functionality, GitHub Actions, history, etc
If the following are approved and decided:
-
Process for creating the test filtered clone
-
Test filtered clone functionality and history
-
Where to upload the final clone (existing or new repo)
Phase 3: Get the current repo in a finalized state to create filtered clone
- all open pull requests should be either closed or merged
“The git filter-repo tool and the BFG Repo-Cleaner rewrite your repository's history, which changes the SHAs for existing commits that you alter and any dependent commits. Changed commit SHAs may affect open pull requests in your repository. We recommend merging or closing all open pull requests before removing files from your repository.” (src)
- Notify all current and potential contributors that the repo will be undergoing maintenance and there will be no access or activity to the repo while going through this process.
Phase 4: Create filtered clone and upload to the decided location
-
Create backup of repository
-
Go through the approved filtered clone creation process
-
Upload clone to the decided location
-
If this is a new location
- transfer/migrate information (e.g. issues)
- build site and make sure custom url is pointing to correct repo/branch and the site is live.
Phase 5: After upload, update contributors and relevant information
- Update CONTRIBUTION.MD and related files, text to include any instruction changes (e.g. using LFS)
- Notify contributors on actions to take to reconcile with the new repo (e.g. create from new clone url).
References:
- Other documented experiences reducing blog size: Trimble
- Guides
Atlassian - "Reduce repository size"
Etsi - "Reduce repository size (FREE)"
Gitential - "The Gitential Guide on How to
Reduce the Size of Your Git Repository" - Tools git-sizer git-filter-repo bfg-repo-cleaner
@Nikhil-Ladha
@randychilau FYI - https://discuss.layer5.io/t/looking-for-a-difficult-git-challenge/2996