RD-Agent LoadObjectError: No such file or directory: '/workspace/qlib_workspace/mlruns/1/fae9be92cc5e44e3a72b8f481ccb1bc6/artifacts/portfolio_analysis/report_normal

🐛 Bug Description

To Reproduce

Steps to reproduce the behavior:

Install Developer RDAgent Environment
Run rdagent fin_factor
The RD starts, but I can see on console but stops running:

Workflow Progress: 60%|██████████████████████████████████████████████████████████████████████████████████████████████████▍ | 3/5 [01:12<00:48, 24.15s/step, loop_inde

Expected Behavior

The workflow must finish 100%

Screenshot

Environment

Note: Users can run rdagent collect_info to get system information and paste it directly here.

2024-10-20 13:42:04.296 | WARNING | rdagent.oai.llm_utils::47 - llama is not installed. 2024-10-20 13:42:06.128 | INFO | rdagent.app.utils.info:sys_info:22 - Name of current operating system: Linux 2024-10-20 13:42:06.130 | INFO | rdagent.app.utils.info:sys_info:22 - Processor architecture: x86_64 2024-10-20 13:42:06.132 | INFO | rdagent.app.utils.info:sys_info:22 - System, version, and hardware information: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.39 2024-10-20 13:42:06.134 | INFO | rdagent.app.utils.info:sys_info:22 - Version number of the system: #1 SMP Thu Jan 11 04:09:03 UTC 2024 2024-10-20 13:42:06.136 | INFO | rdagent.app.utils.info:python_info:29 - Python version: 3.10.15 (main, Oct 3 2024, 07:27:34) [GCC 11.2.0] 2024-10-20 13:42:06.158 | INFO | rdagent.app.utils.info:docker_info:39 - Container ID: b1bbfff94ab771deaf613c3f49c4be0c6f3aeb9f6386a39427f28e57cf11dd9a 2024-10-20 13:42:06.160 | INFO | rdagent.app.utils.info:docker_info:40 - Container Name: silly_pascal 2024-10-20 13:42:06.163 | INFO | rdagent.app.utils.info:docker_info:41 - Container Status: exited 2024-10-20 13:42:06.167 | INFO | rdagent.app.utils.info:docker_info:42 - Image ID used by the container: sha256:d7acfebd19e095fca73b2b6ab010fec820a1d45017f767cdf2ccd42bef52258a 2024-10-20 13:42:06.170 | INFO | rdagent.app.utils.info:docker_info:43 - Image tag used by the container: ['local_qlib:latest'] 2024-10-20 13:42:06.174 | INFO | rdagent.app.utils.info:docker_info:44 - Container port mapping: {} 2024-10-20 13:42:06.177 | INFO | rdagent.app.utils.info:docker_info:45 - Container Label: {'com.nvidia.volumes.needed': 'nvidia_driver', 'org.opencontainers.image.ref.name': 'ubuntu', 'org.opencontainers.image.version': '22.04'} 2024-10-20 13:42:06.181 | INFO | rdagent.app.utils.info:docker_info:46 - Startup Commands: nvidia-smi 2024-10-20 13:42:06.186 | INFO | rdagent.app.utils.info:rdagent_info:54 - RD-Agent version: 0.2.2.dev135 2024-10-20 13:42:06.827 | INFO | rdagent.app.utils.info:rdagent_info:76 - Package version: ['pydantic-settings==2.1.0', 'typer==0.9.0', 'cython==3.0.7', 'scipy==1.11.4', 'python-Levenshtein==0.25.1', 'scikit-learn==1.5.1', 'filelock==3.13.1', 'loguru-mypy==0.0.4', 'loguru==0.7.2', 'fire==0.5.0', 'fuzzywuzzy==0.18.0', 'openai==1.6.1', 'ruamel-yaml==0.18.5', 'torch==2.1.2', 'torch_geometric==2.5.3', 'tabulate==0.9.0', 'numpy==1.26.2', 'pandas==2.1.4', 'pandarallel==1.6.5', 'feedparser==6.0.11', 'matplotlib==3.9.1', 'langchain==0.0.353', 'langchain-community==0.0.7', 'tiktoken==0.7.0', 'pymupdf==1.24.9', 'azure-identity==1.17.1', 'pypdf==3.17.4', 'azure-core==1.29.6', 'azure-ai-formrecognizer==3.3.2', 'statsmodels==0.14.2', 'tables==3.9.2', 'tree-sitter-python==0.21.0', 'tree-sitter==0.22.3', 'jupyter==1.0.0', 'python-dotenv==1.0.0', 'docker==7.1.0', 'streamlit==1.39.0', 'plotly==5.24.1', 'st-theme==1.2.3', 'selenium==4.25.0', 'kaggle==1.6.17', 'nbformat==5.10.4', 'seaborn==0.13.2', 'setuptools-scm==8.0.4', 'xgboost==2.1.1', 'lightgbm==4.5.0']

Additional Notes

The qlib_res.csv file has no data.
The file "portfolio_analysis/report_normal_1day.pkl" is not found.
The Experiment back testing fail, the overall RdAgent process fail.

Oct 20 '24 17:10 ejgarcian

Hello Garcia,

From the screenshot you provided, it appears that there might be an issue with the input factor file, which then led to the error in the Qlib backtest. Could you please check if the combined_factors_df.pkl file was successfully generated during this iteration of the loop? This will help us assist you better in resolving the issue.

Thank you.

Oct 21 '24 07:10 WinstonLiyt

Same error, I found that the model cannot correctely call the GPU in the docker container, thus no model weight .pkl file is generated. I am still working on this situation.

File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Oct 21 '24 16:10 YZH0216

LoadObjectError: No such file or directory: '/workspace/qlib_workspace/mlruns/1/fae9be92cc5e44e3a72b8f481ccb1bc6/artifacts/portfolio_analysis/report_normal_1day.pkl'

🐛 Bug Description

To Reproduce

Expected Behavior

Screenshot

Environment

Additional Notes