dvc icon indicating copy to clipboard operation
dvc copied to clipboard

GitHub Release as a backend

Open Robinlovelace opened this issue 2 years ago • 6 comments

Just looking at the remote storage options here: https://dvc.org/doc/user-guide/data-management/remote-storage

My go-to place to host data resulting from code in GitHub repos is, naturally I think, releases.

gh release create v1 data.csv

Should work, should be possible to support?

Cc @anitagraser whose great tutorial on this led to this question.

Robinlovelace avatar Aug 31 '23 12:08 Robinlovelace

@Robinlovelace Could you elaborate, please? I'm not sure I understand how it would work. Every new dvc push creating a new release?

efiop avatar Aug 31 '23 13:08 efiop

Same as it works for other remote storage options is my thinking.

Robinlovelace avatar Aug 31 '23 13:08 Robinlovelace

@Robinlovelace So updating the same release artifacts? Just trying to understand if you have particular ideas or if this is just a general idea.

efiop avatar Aug 31 '23 14:08 efiop

Yes, just a general idea at this stage, with no GitHub-specific thoughts on implementation.

Robinlovelace avatar Aug 31 '23 14:08 Robinlovelace

I remember I was discussing this with someone. The benefit is that by default it has pretty nice limits for a happy-path scenario: Each file included in a release must be under 2 GiB. There is no limit on the total size of a release, nor bandwidth usage.

It felt that even a single release could be a DVC remote by itself if you put it under a proper API.

Also reminds me a bit of this proposal and discussion by @sisp https://gitlab.com/gitlab-org/gitlab/-/issues/413612 and motivation can be similar (we had also a discussion on GitLabFS somewhere in one of our repositories).

shcheklein avatar Aug 31 '23 20:08 shcheklein

I remember I was discussing this with someone. The benefit is that by default it has pretty nice limits for a happy-path scenario: Each file included in a release must be under 2 GiB. There is no limit on the total size of a release, nor bandwidth usage.

It felt that even a single release could be a DVC remote by itself if you put it under a proper API.

I think the minimum set of operations we need are supported in their API:

https://docs.github.com/en/free-pro-team@latest/rest/releases/assets?apiVersion=2022-11-28

But, I wonder if it we need to consider some terms and conditions regarding the usage of release assets

daavoo avatar Sep 01 '23 08:09 daavoo