git-lfs icon indicating copy to clipboard operation
git-lfs copied to clipboard

After using Git LFS to manage files, the GitLab repository actually grew larger in size

Open zhangshiyu12345 opened this issue 10 months ago • 3 comments

Perform the following operation on a branch of the repository in Gitlab
git lfs prune
git lfs install
 
git lfs track "*.psd"
git lfs track "*.zip"
git lfs track "*.png"
git lfs track "*.jpg"
git lfs track "*.gif"
git lfs track "*.dll"
git lfs track "*.lib"
git lfs track "*.so"
git lfs track "*.tar.gz"
git lfs track "*.deb"
git lfs track "*.war"
git lfs track "*.pdb"
git lfs track "*.ssm"
git lfs track "*.bat"
git lfs track "*.pdf"
git lfs track "*.xlsx"
git lfs track "*.docx"
git lfs track "*.txt"
git lfs track "*.svg"
git lfs track "*.pptx"
git lfs track "*.exe"

    
git add .
git commit -m "Enable Git LFS"

# git lfs migrate import --include="*.zip *.psd *.png *.jpg *.gif *.dll *.lib *.so *.exe *.tar.gz *.deb *.rpm *.war *.pdb *.ssm *.bat *.db *.pdf *.xlsx *.docx *.pptx *.svg *.txt" --everything

git push --force

After submission, in the repository details interface of GitLab, the repository actually became larger, from 10GB to 13.3GB. Why is this 222222

zhangshiyu12345 avatar Dec 28 '24 10:12 zhangshiyu12345

I don't know how GitLab stores LFS files and how it measures the repository size. However, I think it likely that it stores each LFS object individually and does not take advantage of their similarities. Compared to the Git pack format, this could especially increase the space consumption of *.docx files that have multiple versions. A *.docx file is a zip package in which each part is individually compressed; if multiple versions of a *.docx file include the same image, then the differences in other (e.g. text) parts of the file do not affect how the image is compressed, and Git pack compression can probably take advantage of this similarity, but the Git LFS server might not be able to do the same.

KalleOlaviNiemitalo avatar Dec 28 '24 12:12 KalleOlaviNiemitalo

git lfs track "*.bat"

That seems a bad idea. In my experience, *.bat files are not very large, and if there are multiple versions, then the differences between versions are small.

KalleOlaviNiemitalo avatar Dec 28 '24 12:12 KalleOlaviNiemitalo

Hey, I'm sorry for the trouble, and I hope you've contacted GitLab's support team and asked for their help in answering your question.

As this project is just for the open-source Git LFS client software, we usually can't answer questions about the internal operations of the various Git LFS hosting providers like GitLab, GitHub, etc. You will be best off contacting GitLab directly, as they should be able to explain what their system is doing in your case.

That said, I suspect @KalleOlaviNiemitalo's comments above are correct, that GitLab stores each Git LFS object separately.

As well, although you have run a git push --force command, GitLab's servers may not immediately prune your existing Git history and run a full Git garbage collection on your repository, so they could be storing the commits from both your previous Git history (which didn't use Git LFS) and your new one (which does use Git LFS).

chrisd8088 avatar Jan 06 '25 19:01 chrisd8088