skypilot icon indicating copy to clipboard operation
skypilot copied to clipboard

[Storage] Sky storage does not support single files

Open Michaelvll opened this issue 3 years ago • 3 comments

Our sky storage does not support uploading a single file to the sky storage with the following yaml

file_mounts:
  /test-path:
    name: sky-test-tmp
    source: ~/file.txt
    store: gcs

That is because we use gsutil -m rsync for everything, but we should switch to gsutil -m cp when we find the source is a file.

Michaelvll avatar Aug 12 '22 06:08 Michaelvll

I was starting to implement this but I realized it seems to break our destination/source path semantics. For example, lets use the yaml above:

file_mounts:
  /test-path:
    name: sky-test-tmp
    source: ~/file.txt
    store: gcs

Is /test-path now expected to be a file which has the contents of ~/file.txt? Or is it a directory and we put file.txt at /test-path/file.txt?

  1. If /test-path is intended to be a file, we cannot support MOUNT mode since most mounting tools don't allow mounting in a non empty directory.
  2. If /test-path is intended to be a directory, we can maybe support it but we will need special handling for COPY mode. We will need to maintain and pass the source (in addition to the bucket name) to the VM during file mounting to ensure only the specific file is copied to the VM (e.g., when the user upload additional files to the bucket).

If the user wants to upload just one file, a workaround could be for the user to move the file to a new directory and use that as the source. Would that be feasible?

I am leaning towards keeping Storage sources limited to directories only, but happy to do option 2 if you think its an important use case.

romilbhardwaj avatar Aug 12 '22 18:08 romilbhardwaj

That is a great discussion! Option 2 sounds good to me. I was a bit afraid of moving around the files would scare the user for the file disappearing, especially when doing the auto-translation for the file mounts in the spot case. I start thinking about whether a hard link will work for the spot case, that we hard link each file to a hidden folder and upload that folder.

Michaelvll avatar Aug 16 '22 17:08 Michaelvll

Another thought is that can we specify a subdirectory for the name, such as

file_mounts:
  /test-path:
    name: sky-test-tmp/file.txt
    source: ~/file.txt
    store: gcs

In this case, the /test-path should be the file itself, and we can simply disallow the MOUNT mode for the name with subdirectory?

Michaelvll avatar Aug 16 '22 17:08 Michaelvll

Tracked in #1226 with a deeper discussion there

romilbhardwaj avatar Oct 11 '22 23:10 romilbhardwaj