dvc icon indicating copy to clipboard operation
dvc copied to clipboard

Streaming cloud-versioned datasets

Open dberenbaum opened this issue 2 years ago • 2 comments

https://github.com/iterative/dvc/pull/10164 will introduce datasets as a new type of dependency that aren't based on the local filesystem. This same mechanism can be used to support cloud-versioned data. Users can specify a version ID, freeze it, make it a stage dependency, and stream it into their code using the DVC API.

One issue with cloud-versioning is that the directory contents get exploded, and this may cause problems reading dvc.lock.

dberenbaum avatar Jan 11 '24 18:01 dberenbaum

Hi, I'm new to dvc and am using cloud versioning. I have my data on GCS which is tracked, which I want to load into a dataframe without actually downloading the data into my local(dvc pull). I tried using dvc.api.open() but it gives me a git SCM error! Not sure if this is the right place to bring this up, but I'm really directionless at this point. Is my problem in the scope of issues to be fixed? Thanks!

Siddharth1060 avatar Jan 19 '24 19:01 Siddharth1060

@Siddharth1060 Could you please open a separate issue and show the full output of the error you get?

dberenbaum avatar Jan 19 '24 20:01 dberenbaum