skypilot
skypilot copied to clipboard
[Storage] Sky storage does not support single files
Our sky storage does not support uploading a single file to the sky storage with the following yaml
file_mounts:
/test-path:
name: sky-test-tmp
source: ~/file.txt
store: gcs
That is because we use gsutil -m rsync for everything, but we should switch to gsutil -m cp when we find the source is a file.
I was starting to implement this but I realized it seems to break our destination/source path semantics. For example, lets use the yaml above:
file_mounts:
/test-path:
name: sky-test-tmp
source: ~/file.txt
store: gcs
Is /test-path now expected to be a file which has the contents of ~/file.txt? Or is it a directory and we put file.txt at /test-path/file.txt?
- If
/test-pathis intended to be a file, we cannot supportMOUNTmode since most mounting tools don't allow mounting in a non empty directory. - If
/test-pathis intended to be a directory, we can maybe support it but we will need special handling forCOPYmode. We will need to maintain and pass the source (in addition to the bucket name) to the VM during file mounting to ensure only the specific file is copied to the VM (e.g., when the user upload additional files to the bucket).
If the user wants to upload just one file, a workaround could be for the user to move the file to a new directory and use that as the source. Would that be feasible?
I am leaning towards keeping Storage sources limited to directories only, but happy to do option 2 if you think its an important use case.
That is a great discussion! Option 2 sounds good to me. I was a bit afraid of moving around the files would scare the user for the file disappearing, especially when doing the auto-translation for the file mounts in the spot case. I start thinking about whether a hard link will work for the spot case, that we hard link each file to a hidden folder and upload that folder.
Another thought is that can we specify a subdirectory for the name, such as
file_mounts:
/test-path:
name: sky-test-tmp/file.txt
source: ~/file.txt
store: gcs
In this case, the /test-path should be the file itself, and we can simply disallow the MOUNT mode for the name with subdirectory?
Tracked in #1226 with a deeper discussion there