about
about copied to clipboard
move large files out of repo into separate location/repo
As raised in https://github.com/publiccodenet/about/pull/461#pullrequestreview-312399224 we have some large files (like 33MB .odt files which have embedded fonts) which might be better outside the repository. This is especially true for files which do not naturally "diff -u" well.
I think, knowing more about Git, that storing files this large might just not really make much sense in Git for now. Every clone gets all the history and with multiple of these files it could balloon out of proportion. We could enable Git-LFS, however that might complate things.
For now, I think larger files are best left versioned elsewhere.
We now use Nextcloud for this. It looks like the only .odt files left are for the governance exercise (cc @Ainali). Are there any other large files anyone's aware of?
No, I am not aware of anything else (is it possible to search a repo by file size?). And those files should really move to Nextcloud.
We talked about this briefly in our weekly stewards meeting. A problem with Nextcloud is to keep track in there what files are being referred to from external pages. If we move them around or rename them links might be broken. The same might happen if we change from Nexcloud to something else or (less likely) if we change provider.
An alternative would be to have a separate repository for only the large files. This would bring the advantages of git while still keeping the 'regular' text dominated repository light weight.
Which of these, or any other, alternatives do we prefer?
If we move them around or rename them links might be broken.
Wouldn't GitHub Actions alert us to broken links? Though I guess this is only helpful for us, and not externals who link to our resources.
Based on this, I do think it would be good to have a separate repository for all the bigger assets we think others may want to link to.
Yeah, GitHub actions only warn for our own internal links, not incoming links from around the web.
Newly added links are checked on each pull request.
There is a full link-check which runs separately. The emails contain many false alarms but we do check them to make sure.
I expect links to assets will be linking to them from where they are deployed, not to the repository, thus which repository they are extracted as part of the build should not impact this much.