Streaming cloud-versioned datasets
https://github.com/iterative/dvc/pull/10164 will introduce datasets as a new type of dependency that aren't based on the local filesystem. This same mechanism can be used to support cloud-versioned data. Users can specify a version ID, freeze it, make it a stage dependency, and stream it into their code using the DVC API.
One issue with cloud-versioning is that the directory contents get exploded, and this may cause problems reading dvc.lock.
Hi, I'm new to dvc and am using cloud versioning. I have my data on GCS which is tracked, which I want to load into a dataframe without actually downloading the data into my local(dvc pull). I tried using dvc.api.open() but it gives me a git SCM error! Not sure if this is the right place to bring this up, but I'm really directionless at this point. Is my problem in the scope of issues to be fixed? Thanks!
@Siddharth1060 Could you please open a separate issue and show the full output of the error you get?