ray icon indicating copy to clipboard operation
ray copied to clipboard

[tune] `URI has empty scheme` error when `storage_path` in `RunConfig` is relative

Open Dj-Polyester opened this issue 1 year ago • 1 comments

What happened + What you expected to happen

NOTE: I am sorry if I am spamming but I did not know how to write the issue in my first attempt in #42968. You can remove that issue as it points out to the same bug.

Here it says: You can specify the storage_path and trainable_name:

# This logs to 2 different trial folders:
# ./results/test_experiment/trial_name_1 and 
./results/test_experiment/trial_name_2
# Only trial_name is autogenerated.
tuner = tune.Tuner(trainable,
    tune_config=tune.TuneConfig(num_samples=2),
    run_config=RunConfig(storage_path="./results", 
name="test_experiment"))
results = tuner.fit()

However, when I use a relative path for storage_path, I get:

ArrowInvalid: URI has empty scheme [relative_path]

When I use absolute path, it works. For a workaround, I used

from pathlib import Path
Path(relative_path).resolve()

to obtain an absolute path. I am reporting this so that either you can fix the code (1), or allow me to pr (2). Alternatively, you can modify the docs so that storage_path only requests absolute path (3). I don't prefer (3) to be honest. I hope the information was useful.

Versions / Dependencies

When I do pip list:

Package                   Version
------------------------- -----------
aiosignal                 1.3.1
asttokens                 2.4.1
attrs                     23.2.0
bayesian-optimization     1.4.3
certifi                   2024.2.2
charset-normalizer        3.3.2
click                     8.1.7
colorama                  0.4.6
comm                      0.2.1
contourpy                 1.2.0
cycler                    0.12.1
debugpy                   1.8.0
decorator                 5.1.1
executing                 2.0.1
filelock                  3.13.1
fonttools                 4.47.2
frozenlist                1.4.1
fsspec                    2023.12.2
idna                      3.6
ipykernel                 6.29.0
ipython                   8.21.0
jedi                      0.19.1
Jinja2                    3.1.3
joblib                    1.3.2
jsonschema                4.21.1
jsonschema-specifications 2023.12.1
jupyter_client            8.6.0
jupyter_core              5.7.1
kiwisolver                1.4.5
lightning-utilities       0.10.1
MarkupSafe                2.1.5
matplotlib                3.8.2
matplotlib-inline         0.1.6
mpmath                    1.3.0
msgpack                   1.0.7
nest-asyncio              1.6.0
networkx                  3.2.1
numpy                     1.26.3
nvidia-cublas-cu12        12.1.3.1
nvidia-cuda-cupti-cu12    12.1.105
nvidia-cuda-nvrtc-cu12    12.1.105
nvidia-cuda-runtime-cu12  12.1.105
nvidia-cudnn-cu12         8.9.2.26
nvidia-cufft-cu12         11.0.2.54
nvidia-curand-cu12        10.3.2.106
nvidia-cusolver-cu12      11.4.5.107
nvidia-cusparse-cu12      12.1.0.106
nvidia-nccl-cu12          2.19.3
nvidia-nvjitlink-cu12     12.3.101
nvidia-nvtx-cu12          12.1.105
packaging                 23.2
pandas                    2.2.0
parso                     0.8.3
pexpect                   4.9.0
pillow                    10.2.0
pip                       23.2.1
pipdeptree                2.13.2
platformdirs              4.2.0
prompt-toolkit            3.0.43
protobuf                  4.25.2
psutil                    5.9.8
ptyprocess                0.7.0
pure-eval                 0.2.2
pyarrow                   15.0.0
Pygments                  2.17.2
pyparsing                 3.1.1
python-dateutil           2.8.2
pytz                      2024.1
PyYAML                    6.0.1
pyzmq                     25.1.2
ray                       2.9.1
referencing               0.33.0
requests                  2.31.0
rpds-py                   0.17.1
scikit-learn              1.4.0
scipy                     1.12.0
setuptools                65.5.0
six                       1.16.0
stack-data                0.6.3
sympy                     1.12
tensorboardX              2.6.2.2
threadpoolctl             3.2.0
torch                     2.2.0
torchmetrics              1.3.0.post0
torchvision               0.17.0
tornado                   6.4
traitlets                 5.14.1
triton                    2.2.0
typing_extensions         4.9.0
tzdata                    2023.4
urllib3                   2.2.0
wcwidth                   0.2.13

also python gives:

Python 3.11.5 (main, Sep  2 2023, 14:16:33) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.

Reproduction script

You can reproduce the behavior by using the code snippet mentioned above taken from the docs. If you use relative path for storage_path you get ArrowInvalid error of pyarrow. If you use absolute path it works.

Issue Severity

Low: It annoys or frustrates me.

Dj-Polyester avatar Feb 04 '24 13:02 Dj-Polyester

Can confirm that using absolute path fixes the issue.

qazi0 avatar Feb 20 '24 17:02 qazi0

Hey @Dj-Polyester, we made this change in Ray 2.7 when simplifying the Ray Train/Tune storage backend to only use pyarrow with as few "layers" on top as possible. So, this storage_path gets fed directly into a pyarrow.fs.FileSystem, without extra checks -- and the pyarrow API doesn't allow for a relative path by itself.

I am thinking of this solution: Documentation change PLUS raising a better exception telling you to call Path(...).resolve() before passing it into storage_path.

The main issue with auto-resolving relative paths into absolute paths is that the behavior on workers is a little ambiguous:

  • The current working directory between workers may be different from the process that you launch the training job from. (For example, Ray Tune will change your worker cwd to the trial directory by default.)
  • In this case, if the user storage path is "./relpath", then it's unclear whether you should use {driver_cwd}/relpath or {worker_cwd}/relpath as the result/checkpoint destination.

What do you think?

justinvyu avatar Feb 20 '24 19:02 justinvyu

@justinvyu I completely agree to the issues that can arise from trying to over-engineer the automatic resolution of relative path into absolute path, given the distributed nature of the task. I would suggest that, instead, it'd be better to just detect that a relative path was passed by a user, and to print a meaningful error message asking the user to pass in the absolute path instead. Besides, in most cases, getting the absolute path isn't that difficult of a thing to do either (pwd).

What do you think?

qazi0 avatar Feb 20 '24 19:02 qazi0