🐛 Bug Description
To Reproduce
Steps to reproduce the behavior:
- Install Developer RDAgent Environment
- Run rdagent fin_factor
- The RD starts, but I can see on console but stops running:
Workflow Progress: 60%|██████████████████████████████████████████████████████████████████████████████████████████████████▍ | 3/5 [01:12<00:48, 24.15s/step, loop_inde
Expected Behavior
The workflow must finish 100%
Screenshot

Environment
Note: Users can run rdagent collect_info to get system information and paste it directly here.
2024-10-20 13:42:04.296 | WARNING | rdagent.oai.llm_utils::47 - llama is not installed.
2024-10-20 13:42:06.128 | INFO | rdagent.app.utils.info:sys_info:22 - Name of current operating system: Linux
2024-10-20 13:42:06.130 | INFO | rdagent.app.utils.info:sys_info:22 - Processor architecture: x86_64
2024-10-20 13:42:06.132 | INFO | rdagent.app.utils.info:sys_info:22 - System, version, and hardware information: Linux-5.15.146.1-microsoft-standard-WSL2-x86_64-with-glibc2.39
2024-10-20 13:42:06.134 | INFO | rdagent.app.utils.info:sys_info:22 - Version number of the system: #1 SMP Thu Jan 11 04:09:03 UTC 2024
2024-10-20 13:42:06.136 | INFO | rdagent.app.utils.info:python_info:29 - Python version: 3.10.15 (main, Oct 3 2024, 07:27:34) [GCC 11.2.0]
2024-10-20 13:42:06.158 | INFO | rdagent.app.utils.info:docker_info:39 - Container ID: b1bbfff94ab771deaf613c3f49c4be0c6f3aeb9f6386a39427f28e57cf11dd9a
2024-10-20 13:42:06.160 | INFO | rdagent.app.utils.info:docker_info:40 - Container Name: silly_pascal
2024-10-20 13:42:06.163 | INFO | rdagent.app.utils.info:docker_info:41 - Container Status: exited
2024-10-20 13:42:06.167 | INFO | rdagent.app.utils.info:docker_info:42 - Image ID used by the container: sha256:d7acfebd19e095fca73b2b6ab010fec820a1d45017f767cdf2ccd42bef52258a
2024-10-20 13:42:06.170 | INFO | rdagent.app.utils.info:docker_info:43 - Image tag used by the container: ['local_qlib:latest']
2024-10-20 13:42:06.174 | INFO | rdagent.app.utils.info:docker_info:44 - Container port mapping: {}
2024-10-20 13:42:06.177 | INFO | rdagent.app.utils.info:docker_info:45 - Container Label: {'com.nvidia.volumes.needed': 'nvidia_driver', 'org.opencontainers.image.ref.name': 'ubuntu', 'org.opencontainers.image.version': '22.04'}
2024-10-20 13:42:06.181 | INFO | rdagent.app.utils.info:docker_info:46 - Startup Commands: nvidia-smi
2024-10-20 13:42:06.186 | INFO | rdagent.app.utils.info:rdagent_info:54 - RD-Agent version: 0.2.2.dev135
2024-10-20 13:42:06.827 | INFO | rdagent.app.utils.info:rdagent_info:76 - Package version: ['pydantic-settings==2.1.0', 'typer==0.9.0', 'cython==3.0.7', 'scipy==1.11.4', 'python-Levenshtein==0.25.1', 'scikit-learn==1.5.1', 'filelock==3.13.1', 'loguru-mypy==0.0.4', 'loguru==0.7.2', 'fire==0.5.0', 'fuzzywuzzy==0.18.0', 'openai==1.6.1', 'ruamel-yaml==0.18.5', 'torch==2.1.2', 'torch_geometric==2.5.3', 'tabulate==0.9.0', 'numpy==1.26.2', 'pandas==2.1.4', 'pandarallel==1.6.5', 'feedparser==6.0.11', 'matplotlib==3.9.1', 'langchain==0.0.353', 'langchain-community==0.0.7', 'tiktoken==0.7.0', 'pymupdf==1.24.9', 'azure-identity==1.17.1', 'pypdf==3.17.4', 'azure-core==1.29.6', 'azure-ai-formrecognizer==3.3.2', 'statsmodels==0.14.2', 'tables==3.9.2', 'tree-sitter-python==0.21.0', 'tree-sitter==0.22.3', 'jupyter==1.0.0', 'python-dotenv==1.0.0', 'docker==7.1.0', 'streamlit==1.39.0', 'plotly==5.24.1', 'st-theme==1.2.3', 'selenium==4.25.0', 'kaggle==1.6.17', 'nbformat==5.10.4', 'seaborn==0.13.2', 'setuptools-scm==8.0.4', 'xgboost==2.1.1', 'lightgbm==4.5.0']
Additional Notes
- The qlib_res.csv file has no data.
- The file "portfolio_analysis/report_normal_1day.pkl" is not found.
- The Experiment back testing fail, the overall RdAgent process fail.
Hello Garcia,
From the screenshot you provided, it appears that there might be an issue with the input factor file, which then led to the error in the Qlib backtest. Could you please check if the combined_factors_df.pkl file was successfully generated during this iteration of the loop? This will help us assist you better in resolving the issue.
Thank you.
Same error, I found that the model cannot correctely call the GPU in the docker container, thus no model weight .pkl file is generated. I am still working on this situation.
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.