client icon indicating copy to clipboard operation
client copied to clipboard

Support very large bucket directories

Open simonlsk opened this issue 1 year ago • 0 comments

Right now the DagsHubFilesystem offers a listdir method that returns a list. What if I am trying to access a very large bucket directory, I cannot expect that list to be infinitely big. Example snippet that will time out:

from dagshub.streaming import DagsHubFilesystem
fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/radiant-mlhub-dataset")
fs.listdir("s3://radiant-mlhub/bigearthnet")

I propose that the client implements a fs.Walk that returns a generator with potentially infinite content.

simonlsk avatar Mar 20 '23 10:03 simonlsk