skypilot
skypilot copied to clipboard
[Sky Launch] User's `workdir` name should match Cloud's `workdir` name
If the user's setup script installs their own workdir
(such as pip install -e .
), the installed packaged will be named sky_workdir
because Sky uploads the user's work directory under sky_workdir
.
One fix is to use ~/sky_workdir/<user_workdir_name>
. Another is ~/<user_workdir_name>
. The former seems better due to some encapsulation.
How about recreating user's exact workdir path? e.g. if workdir is /home/romilb/myproject/mycode
, we create the exact same path on remote instead of putting at ~/sky_workdir
. This can be a problem if the workdir is some standard path (e.g. /tmp, /bin), but otherwise it allows users to retain absolute paths in their code
That will encounter permission issues. The file_mounts helper circumvents them by symlinking from under ~/: https://github.com/sky-proj/sky/blob/47e264d9187fdc8702a4a254d85a1a7efa52dd4a/sky/backends/backend_utils.py#L75 I'm leaning towards doing the less complex thing to satisfy user asks at the moment.
To add, the PR description shows an example of why preserving the workdir name is needed. However, preserving the exact path seems to only be useful if the task's going to traverse the path -- e.g., cd up one level, ls two levels up -- which we can't support because those levels aren't synced. That said, pls push back if this is explicitly requested by Daniel!
Hmm, one example case where we need the exact path is it is hardcoded in user code (e.g. f=open('/home/romilb/myproject/mycode/myfile.csv')
. Ideally good code shouldn't have such absolute paths, but I sometimes find it in scripts/notebooks I've written in a hurry
That will encounter permission issues
would chown/chmod be sufficient to fix that?
+1 with Romil's idea. This aligns with the ultimate goal to make the remote env as similar as possible as to the local env.
Seems like we should follow Docker's philosophy here. The pattern
WORKDIR /app
COPY . .
is commonly used and advertised. Here, the local workdir is .
, which can have a complex path, while the remote workdir location is simply /app
.
Guessing they don't imitate the paths because it's a leaky abstraction for the reason mentioned above (what if users "use" parent dirs?), and thus make the container image no longer self-contained.
Here are some of the possible solutions for how to deal with the workdir
after a discussion with @concretevitamin and @romilbhardwaj , but for now, we can stick with modifying the doc.