skypilot icon indicating copy to clipboard operation
skypilot copied to clipboard

KeyError on Syncing File Mounts When Launching Spot

Open mjaysonnn opened this issue 9 months ago • 1 comments

I am encountering a KeyError when attempting to launch a spot using tutorials. The error occurs during the setup phase where file mounts are supposed to be synced. (Same error everytime I change for remove file mounts) The exact error message is:

I 04-27 00:23:21 controller_utils.py:504] Translating workdir and file_mounts with local source paths to SkyPilot Storage...
I 04-27 00:23:21 controller_utils.py:529] Workdir '.' will be synced to cloud storage 'skypilot-workdir-mj-6f6a9c5a'.
I 04-27 00:23:21 controller_utils.py:554] Folder in local file mount '.' will be synced to SkyPilot storage skypilot-filemounts-folder-mj-6f6a9c5a-0.
I 04-27 00:23:21 controller_utils.py:602] Uploading sources to cloud storage. See: sky storage ls
I 04-27 00:23:23 storage.py:1383] Created S3 bucket skypilot-workdir-mj-6f6a9c5a in eu-north-1
I 04-27 00:23:26 storage.py:1383] Created S3 bucket skypilot-filemounts-folder-mj-6f6a9c5a-0 in eu-north-1
Traceback (most recent call last):
  File "/Users/mj/anaconda3/envs/sky/bin/sky", line 8, in <module>
    sys.exit(cli())
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/sky/utils/common_utils.py", line 354, in _record
    return f(*args, **kwargs)
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/sky/cli.py", line 805, in invoke
    return super().invoke(ctx)
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/sky/utils/common_utils.py", line 354, in _record
    return f(*args, **kwargs)
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/sky/cli.py", line 805, in invoke
    return super().invoke(ctx)
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/sky/utils/common_utils.py", line 375, in _record
    return f(*args, **kwargs)
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/sky/utils/common_utils.py", line 375, in _record
    return f(*args, **kwargs)
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/sky/cli.py", line 3292, in spot_launch
    spot_lib.launch(dag,
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/sky/utils/common_utils.py", line 375, in _record
    return f(*args, **kwargs)
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/sky/spot/core.py", line 77, in launch
    controller_utils.maybe_translate_local_file_mounts_and_sync_up(
  File "/Users/mj/anaconda3/envs/sky/lib/python3.10/site-packages/sky/utils/controller_utils.py", line 609, in maybe_translate_local_file_mounts_and_sync_up
    storage = task.storage_mounts[file_mount_remote_tmp_dir]
KeyError: '/tmp/sky-spot-filemounts-files'

Here is my YAML (mixing tutorials and not using GPU since it's only for testing)

resources:
  cloud: aws
  instance_type: m5.2xlarge

workdir: .

file_mounts:
  .: .

setup: |
  echo "Running setup."

run: |
  echo "Hello, SkyPilot!"

conda env list

I would appreciate any guidance on how to properly configure my YAML file to avoid this error, or any updates that might resolve this issue.

Version & Commit info:

  • sky -v: skypilot, version 1.0.0.dev20240425
  • sky -c: skypilot, commit 34b55f9694a7ad635957fd3a2631e0eeed06f07e

mjaysonnn avatar Apr 27 '24 04:04 mjaysonnn

Hi @mjaysonnn - this was a bug introduced in #3476 and fixed in #3480. Should be fixed in nightly 1.0.0.dev20240426 and later. Can you try with the latest nightly build? pip install -U skypilot-nightly

romilbhardwaj avatar Apr 27 '24 05:04 romilbhardwaj

This issue is stale because it has been open 120 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions[bot] avatar Aug 27 '24 01:08 github-actions[bot]

Solved it. Thanks!

mjaysonnn avatar Sep 04 '24 19:09 mjaysonnn