textlayout
textlayout copied to clipboard
repository is large (200MB)
I couldn't help notice that downloading github.com/benoitkugler/textlayout takes quite a while, so I ran
$ du -hs .
311M .
$ du -hs *
155M fonts
4.0K go.mod
4.0K go.sum
19M graphite
12M harfbuzz
440K language
4.0K LICENSE
4.0K README.md
4.0K test
5.2M unicodedata
to check.
It's unfortunate to have to fetch at least ~200 MB data, or even > 300 MB for the entire history just to access the Go source. Would you be open to slimming down the repository and rewrite Git history to obtain a leaner dependency? If it's inconvenient to slim down the testdata files, perhaps they could be extracted into a separate (test-only) dependency module?
I've no argument against rewriting Git history, but I'm not proficient at all in this exercice !
The weight of the module is indeed coming from test font files. I would prefer not to reduce test coverage, but if they can be extracted in a test-only dependency, let's do it. What would be the way to proceed ? Do go modules have a way to specify test-only deps ?
Thanks. I'll take a stab at it if no-one else beats me to it.
Do go modules have a way to specify test-only deps ?
Not that I know of, but I'm hoping that non-test builds can avoid downloading the test-module.
PR #12 changes the tests to use an external module for their data. For the git history rewrite, I came up with:
$ git filter-branch --force --index-filter 'git rm -r --cached --ignore-unmatch fonts/type1C/test graphite/testdata harfbuzz/testdata font/truetype/testdata fonts/type1/type1.test' -- <BRANCH>
from https://www.deployhq.com/git/faqs/removing-large-files-from-git-history.
Note that only BRANCH is rewritten, which is intended; you want to keep your existing tags pointing at the old data. I believe git clone by default pulls data from tags as well, so to reap the size gains the existing tags will have to be deleted at some point.
Although most likely unrelated, it might be worth mentioning that the size of the downloaded zip differs by ca 80MB across three different computers right now. Needless to say, there are issues on the machine that only receives 53 MB instead of 130MB
Although a github problem I assume, it might be solved by slimming the repo.
PR #12 has indeed nicely slimmed down the repo.
However, I'm not seing a large benefit (in size) by running git filter-branch as you hinted : I'm still at 121M for the .git directory...
However, I'm not seing a large benefit (in size) by running
git filter-branchas you hinted : I'm still at 121M for the .git directory...
Did you run git gc? If that doesn't help, I believe it's because you still have other branches or tags referring to commits including the test data. If you replace <BRANCH> with --all in the filter-branch command (perhaps followed by a git gc), the .git directory should slim as well.
However, the reason I suggested a branch instead of --all is because I assume you want to preserve the already released tags for a while. If you change the content of the existing tags, Go will complain about module checksum mismatches for direct module fetches.
In summary, I suggest running filter-branch on your main branch to remove testdata and the stray test binary, and force-push that. Then, after a while, delete old branches and release tags that refer to the old history, leaving only release tags that refer to the new history.
The repository seems to now only be 17MB. I think this can be closed