yogadl icon indicating copy to clipboard operation
yogadl copied to clipboard

Submit a dataset to GCS storage do not storage all the data into GCS

Open kk17 opened this issue 3 years ago • 0 comments

In one of my Jupyter notebook

from yogadl import dataref, storage

fs_config = yogadl.storage.GCSConfigurations(
        bucket="mybucket",
        bucket_directory_path="yogadl_cache",
        url=f"ws://localhost:10050",
        local_cache_dir="/tmp/",
    )
storage = yogadl.storage.GCSStorage(fs_config)
storage.submit(val_ds, "dl_a2_val", "1.0")

In another jupyter notebook on another machine:

import yogadl

# Get the DataRef.
dataref = storage.fetch("dl_a2_val", "1.0")

I got the following error:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-10-9b0b8157fa6d> in <module>()
      2 
      3 # Get the DataRef.
----> 4 dataref = storage.fetch("dl_a2_samples", "1.0")
      5 
      6 # Tell the DataRef how to stream the dataset.

2 frames
/usr/local/lib/python3.7/dist-packages/google/cloud/storage/blob.py in download_to_filename(self, filename, client, start, end)
    662         """
    663         try:
--> 664             with open(filename, "wb") as file_obj:
    665                 self.download_to_file(file_obj, client=client, start=start, end=end)
    666         except resumable_media.DataCorruption:

FileNotFoundError: [Errno 2] No such file or directory: '/tmp/yogadl_local_cache/dl_a2_samples/1.0/cache.mdb'

In the GCS, I only see a chache.bdb file. Why submit a dataset to storage does not store all data into GCS?

kk17 avatar Nov 10 '21 12:11 kk17