Archive/Download do not include LFS files or Submodules
- Gitea version (or commit ref): 1.5.0
- Git version: 2.16.4
- Operating system: centos 7
- Database (use
[x]):- [ ] PostgreSQL
- [ ] MySQL
- [ ] MSSQL
- [x] SQLite
- Can you reproduce the bug at https://try.gitea.io:
- [ ] Yes (provide example URL)
- [ ] No
- [x] Not relevant
- Log gist: Downloads of LFS files
Description
With git-lfs installed and enabled on both the gitea server host and the client host, LFS controlled files do not get added properly to the .zip or .tar.gz files when:
- Using the Download Repository button
- Downloading a release
Instead of the expected file in the .zip or .tar.gz, a text file of the same name is placed in the file.
The rest endpoint also functions in the same way.
GET /repos/{owner}/{repo}/raw/{filepath}
In other respects, git-lfs works as expected when using git command line to interact with the repo.
Screenshots
Text files look like this:
version https://git-lfs.github.com/spec/v1
oid sha256:a7da80fc96bc0dd73ea0416fda5dfe1321910517634d4b142903a9fbab24f196
size 1465634
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs during the next 2 weeks. Thank you for your contributions.
This issue has been automatically closed because of inactivity. You can re-open it if needed.
Are there any updates, fixes or thoughts on how to approach this issue?
We would love to use Gitea and its API to download releases directly onto deployment servers and end users, but Gitea not including any LFS objects to the downloads is a huge problem. Using git to clone the repository is not an option as we cannot mandate our customers to install any extra software.
So there is at least now a GET /repos/{owner}/{repo}/media/{filepath} endpoint which means that you can get the actual lfs'd data.
Could you give me some information as to how you create the zips - I don't immediately know where to look to find the code that creates them.
#7209 might be related
@schmittlauch I'm not certain. I would have to dive to see how these zips are created.
My suspicion is that these zips do not even attempt to dereference the LFS pointers whereas on #7209 your problem is different.
OK so yeah #7209 is not relevant to this.
The issue is that we use git archive to create these archives. That doesn't include submodules either - so I think this needs a complete rethink.
And how github archive did that?
I suspect they rewrote the command. Back in November 2018 https://github.com/git-lfs/git-lfs/issues/1322#issuecomment-426822783 states that they didn't include lfs files (and likely submodules) in their zips.
I think that's what we're going to have to do unfortunately.
This again leads to the slightly annoying issue whereby we don't know what files are LFS files except by reading them and checking if they're a pointer or not.
Similarly we need to do this zipping in the context of the current user and repository. In the case of submodules - it's conceivable that the zip that one user downloads may not be the same as the zip another user gets - I guess that's ok but it means caching these might be difficult unless we cache them with the associated permission state.
Finally we must be very careful indeed about which submodules we're happy to include, if any - perhaps just allow those that are local to the gitea instance?
Just wanted to add my two cents to this conversation. Without this feature there is not much point in using git lfs at all. All the development work happens using lfs and when a production version is produced, it contains lots of nasty surprises in blank pointer files. My current approach is to not use lfs at all and handle big binaries separately to git. This creates a lot of extra mess that would be much easier if I could just download the archive.
I know it wouldn't be as fast as git archive, but could gitea just checkout the repository to a temp directory and archive that? that way any smudge filters and submodules would be handled without directly having to handle them. It could even cache results for the head of the main branch to prevent it from having to run multiple times on subsequent downloads
Gitea checking out the repository is not a good idea.
There is a wrapper around git archive that would allows to at least handles submodules. https://github.com/fabacab/git-archive-all.sh/blob/master/git-archive-all.sh
Maybe gitea could use it ?
Hopefully this can be fixed in future versions
How is it going? Does someone know the Problem?
How is it going? Does someone know the Problem?
Yeah still an issue in 1.16.1 so far
I see this in 1.18.5
Still an issue in v1.19.0
Still an issue in v1.20.3 🥲
AFAIK github does not solve this. The archive you get from github may include LFS but not submodules.
Release archives published on github are created and uploaded by the developer to include the submodules.
If no release archive is uploaded by the developer submodules are not included.
It is probably possible to automate with github actions.
@hramrach Github actually solves this (at least for LFS Files), and actually it might be a good idea to make it selectable like github. I personally dont have the usecase of not inlcuding but there might be others. Configuring the default behaviour in a config file would be nice
https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/managing-repository-settings/managing-git-lfs-objects-in-archives-of-your-repository
Is gitea planning to add this feature to the plans? When release zip version, it's filled with LFS links, which greatly affects usability. Adding a configuration option like GitHub would be great.
I personally dont have the usecase of not inlcuding but there might be others.
In my case I need the slim LFS pointer files, NOT the entire blobs. As is, my app is able to deliver slim responses to user browsers through the /api/v1/repos/<owner>/<repo>/archive/<filename>.zip endpoint. Specific LFS blobs are only downloaded on demand, saving enormous bandwidth and UX performance.
I could see a small use case with my app where it would be beneficial to include the blobs in the zip, but most of the time it would need just the LFS pointer files.
So, If this change is made, it should be configurable on a per-request basis. I.e., via UI option or API request parameter - not the git config file which would remove runtime ability to choose. with runtime option via UI or API, apps like mine can adapt easily or require no change at all.
I personally dont have the usecase of not inlcuding but there might be others.
In my case I need the slim LFS pointer files, NOT the entire blobs. As is, my app is able to deliver slim responses to user browsers through the
/api/v1/repos/<owner>/<repo>/archive/<filename>.zipendpoint. Specific LFS blobs are only downloaded on demand, saving enormous bandwidth and UX performance.I could see a small use case with my app where it would be beneficial to include the blobs in the zip, but most of the time it would need just the LFS pointer files.
So, If this change is made, it should be configurable on a per-request basis. I.e., via UI option or API request parameter - not the git config file which would remove runtime ability to choose. with runtime option via UI or API, apps like mine can adapt easily or require no change at all.
Yeah for API requests that does make sense. For downloads via the UI Button i think it is very likely that you want to add them. But a repository wide configuration for for UI download and api requests via parameter sounds logical to me. For API the default should then also be the pointer file to not break existing calls.