ray
ray copied to clipboard
[tune] `URI has empty scheme` error when `storage_path` in `RunConfig` is relative
What happened + What you expected to happen
NOTE: I am sorry if I am spamming but I did not know how to write the issue in my first attempt in #42968. You can remove that issue as it points out to the same bug.
Here it says: You can specify the storage_path and trainable_name:
# This logs to 2 different trial folders:
# ./results/test_experiment/trial_name_1 and
./results/test_experiment/trial_name_2
# Only trial_name is autogenerated.
tuner = tune.Tuner(trainable,
tune_config=tune.TuneConfig(num_samples=2),
run_config=RunConfig(storage_path="./results",
name="test_experiment"))
results = tuner.fit()
However, when I use a relative path for storage_path
, I get:
ArrowInvalid: URI has empty scheme [relative_path]
When I use absolute path, it works. For a workaround, I used
from pathlib import Path
Path(relative_path).resolve()
to obtain an absolute path. I am reporting this so that either you can
fix the code (1), or allow me to pr (2). Alternatively, you can modify
the docs so that storage_path
only requests absolute path (3). I don't
prefer (3) to be honest. I hope the information was useful.
Versions / Dependencies
When I do pip list
:
Package Version
------------------------- -----------
aiosignal 1.3.1
asttokens 2.4.1
attrs 23.2.0
bayesian-optimization 1.4.3
certifi 2024.2.2
charset-normalizer 3.3.2
click 8.1.7
colorama 0.4.6
comm 0.2.1
contourpy 1.2.0
cycler 0.12.1
debugpy 1.8.0
decorator 5.1.1
executing 2.0.1
filelock 3.13.1
fonttools 4.47.2
frozenlist 1.4.1
fsspec 2023.12.2
idna 3.6
ipykernel 6.29.0
ipython 8.21.0
jedi 0.19.1
Jinja2 3.1.3
joblib 1.3.2
jsonschema 4.21.1
jsonschema-specifications 2023.12.1
jupyter_client 8.6.0
jupyter_core 5.7.1
kiwisolver 1.4.5
lightning-utilities 0.10.1
MarkupSafe 2.1.5
matplotlib 3.8.2
matplotlib-inline 0.1.6
mpmath 1.3.0
msgpack 1.0.7
nest-asyncio 1.6.0
networkx 3.2.1
numpy 1.26.3
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.19.3
nvidia-nvjitlink-cu12 12.3.101
nvidia-nvtx-cu12 12.1.105
packaging 23.2
pandas 2.2.0
parso 0.8.3
pexpect 4.9.0
pillow 10.2.0
pip 23.2.1
pipdeptree 2.13.2
platformdirs 4.2.0
prompt-toolkit 3.0.43
protobuf 4.25.2
psutil 5.9.8
ptyprocess 0.7.0
pure-eval 0.2.2
pyarrow 15.0.0
Pygments 2.17.2
pyparsing 3.1.1
python-dateutil 2.8.2
pytz 2024.1
PyYAML 6.0.1
pyzmq 25.1.2
ray 2.9.1
referencing 0.33.0
requests 2.31.0
rpds-py 0.17.1
scikit-learn 1.4.0
scipy 1.12.0
setuptools 65.5.0
six 1.16.0
stack-data 0.6.3
sympy 1.12
tensorboardX 2.6.2.2
threadpoolctl 3.2.0
torch 2.2.0
torchmetrics 1.3.0.post0
torchvision 0.17.0
tornado 6.4
traitlets 5.14.1
triton 2.2.0
typing_extensions 4.9.0
tzdata 2023.4
urllib3 2.2.0
wcwidth 0.2.13
also python
gives:
Python 3.11.5 (main, Sep 2 2023, 14:16:33) [GCC 13.2.1 20230801] on linux
Type "help", "copyright", "credits" or "license" for more information.
Reproduction script
You can reproduce the behavior by using the code snippet mentioned above
taken from the docs. If you use relative path for storage_path
you get ArrowInvalid
error of pyarrow
. If you use absolute path it works.
Issue Severity
Low: It annoys or frustrates me.
Can confirm that using absolute path fixes the issue.
Hey @Dj-Polyester, we made this change in Ray 2.7 when simplifying the Ray Train/Tune storage backend to only use pyarrow
with as few "layers" on top as possible. So, this storage_path
gets fed directly into a pyarrow.fs.FileSystem
, without extra checks -- and the pyarrow API doesn't allow for a relative path by itself.
I am thinking of this solution: Documentation change PLUS raising a better exception telling you to call Path(...).resolve()
before passing it into storage_path
.
The main issue with auto-resolving relative paths into absolute paths is that the behavior on workers is a little ambiguous:
- The current working directory between workers may be different from the process that you launch the training job from. (For example, Ray Tune will change your worker cwd to the trial directory by default.)
- In this case, if the user storage path is
"./relpath"
, then it's unclear whether you should use{driver_cwd}/relpath
or{worker_cwd}/relpath
as the result/checkpoint destination.
What do you think?
@justinvyu I completely agree to the issues that can arise from trying to over-engineer the automatic resolution of relative path into absolute path, given the distributed nature of the task. I would suggest that, instead, it'd be better to just detect that a relative path was passed by a user, and to print a meaningful error message asking the user to pass in the absolute path instead. Besides, in most cases, getting the absolute path isn't that difficult of a thing to do either (pwd
).
What do you think?