RD-Agent fails to run on Windows 11 due to Docker mount path and environment issue
🐛 Bug Description
On a standard Windows 11 machine with Docker Desktop (using the WSL 2 backend) and a clean Miniconda installation, rdagent fails to execute any command (e.g., fin_quant, collect_info). The initial error points to an incorrect Docker volume mount path, and subsequent attempts to fix this reveal deeper issues related to environment path resolution between PowerShell, Conda, and the Python script. The program either fails with a Docker 500 Internal Server Error or exits silently without any error message after printing the initial configuration log.
To Reproduce
Steps to reproduce the behavior:
- Environment Setup:
- OS: Windows 11
- Terminal: PowerShell
- Python Management: Miniconda
- Containerization: Docker Desktop (latest version, with WSL 2 backend)
- Installation:
- Create a new Conda environment:
conda create -n rdagent-env python=3.10 - Activate the environment:
conda activate rdagent-env - Install the package:
pip install rdagent
- Create a new Conda environment:
- Configuration:
- Create a working directory (e.g.,
RD-Agent-Work-Folder). - Inside the directory, create an
.envfile with the necessary API keys and model configurations.
- Create a working directory (e.g.,
- Execution:
- From the working directory, attempt to run any
rdagentcommand:rdagent fin_quant
- From the working directory, attempt to run any
Expected Behavior
The rdagent command should correctly initialize, build the necessary Docker container (local_qlib:latest), and proceed with the task execution. The Docker volume mounts should be compatible with the Windows host file system.
Screenshot
The primary error received during multiple attempts:
docker.errors.APIError: 500 Server Error for http+docker://localnpipe/v1.51/containers/create: Internal Server Error ("mount denied:
the source path "/tmp/full:C:\\workspace\\qlib_workspace\\workspace_cache:rw"
too many colons")
After attempting to patch the source code, subsequent errors included ModuleNotFoundError or the program simply exiting silently after printing the configuration log, with no error.
Environment
Note: The rdagent collect_info command fails to run (exits silently), so this information is gathered manually.
-
Name of current operating system: Windows 11
-
Processor architecture: x64
-
Python version (in Conda env): 3.10
-
RD-Agent version: (The version installed by
pip install rdagent) -
Docker-py version: 7.1.0
-
Conda Environment: Clean environment with only
rdagentand its dependencies installed.
Additional Notes
This issue appears to stem from a fundamental incompatibility with the Windows environment. The debugging process involved several steps:
- Initial Diagnosis: The
too many colonserror strongly suggests that a hardcoded Linux path (/tmp/full) inrdagent/utils/env.pyis being used as a source for a Docker volume mount on a Windows host. - Attempted Fix 1 (Code Patch): We manually edited
rdagent/utils/env.pyto useplatform.system()to check for "Windows" and then usetempfile.mkdtemp()to create a compatible temporary directory. This appeared to have no effect, suggesting the changes were not being picked up. - Attempted Fix 2 (Cache Clearing): Suspecting a Python bytecode caching issue (
.pycfiles), we manually deleted all__pycache__directories within therdagentsite-packages folder. This also did not resolve the issue. - Attempted Fix 3 (Direct Execution): We encountered significant path resolution issues with PowerShell.
conda activatedid not reliably place the environment's Python on the PATH. Attempts to runrdagentvia more direct means (python -m rdagent.app.cli...or using the absolute path to the conda env'spython.exe) led toModuleNotFoundErroror othercan't open fileerrors. - Final State: The core problem seems to be that
rdagent's startup and environment management code is not robust enough for a standard Windows/PowerShell/Conda setup. The silent exit of all commands suggests a failure in a pre-flight check before the main application logic (and Docker interaction) even begins.
A temporary diagnostic script (direct_run.py) was also created to bypass the CLI and directly instantiate the core application logic, but even running this simple script proved impossible due to the persistent path and environment issues.
中文版 / Chinese Version
🐛 Bug 描述
在一台标准的 Windows 11 电脑上,使用 Docker Desktop (WSL 2 后端) 和一个纯净的 Miniconda 环境,rdagent 无法执行任何命令(例如 fin_quant, collect_info)。初始错误指向一个不正确的 Docker 卷挂载路径,后续修复尝试揭示了在 PowerShell、Conda 和 Python 脚本之间存在更深层次的环境路径解析问题。程序最终要么因为 Docker 500 内部服务器错误 而失败,要么在打印出初始配置日志后,没有任何错误信息就直接静默退出。
复现步骤
复现问题的步骤:
- 环境搭建:
- 操作系统: Windows 11
- 终端: PowerShell
- Python 管理: Miniconda
- 容器化: Docker Desktop (最新版, 使用 WSL 2 后端)
- 安装:
- 创建新的 Conda 环境:
conda create -n rdagent-env python=3.10 - 激活环境:
conda activate rdagent-env - 安装包:
pip install rdagent
- 配置:
- 创建一个工作目录 (例如
RD-Agent-Work-Folder)。 - 在该目录内,创建一个包含必要 API 密钥和模型配置的
.env文件。
- 执行:
- 在工作目录下,尝试运行任何
rdagent命令:rdagent fin_quant
预期行为
rdagent 命令应该能够正确初始化,构建所需的 Docker 容器 (local_qlib:latest),并继续执行任务。Docker 的卷挂载应该与 Windows 主机的文件系统兼容。
截图
在多次尝试中收到的主要错误:
docker.errors.APIError: 500 Server Error for http+docker://localnpipe/v1.51/containers/create: Internal Server Error ("mount denied:
the source path "/tmp/full:C:\\workspace\\qlib_workspace\\workspace_cache:rw"
too many colons")
在尝试修补源代码后,后续的错误包括 ModuleNotFoundError,或者程序在打印配置日志后直接静默退出。
运行环境
注意: rdagent collect_info 命令无法运行(静默退出),因此以下信息为手动收集。
- 当前操作系统名称: Windows 11
- 处理器架构: x64
- Python 版本 (Conda 环境内): 3.10
- RD-Agent 版本: (通过
pip install rdagent安装的版本) - docker-py 库版本: 7.1.0
- Conda 环境: 仅安装了
rdagent及其依赖的纯净环境。
额外说明
此问题似乎源于与 Windows 环境的根本性不兼容。整个调试过程包含了以下几个步骤:
- 初步诊断:
too many colons(冒号过多) 错误强烈表明,在rdagent/utils/env.py中有一个硬编码的 Linux 路径 (/tmp/full) 在 Windows 主机上被用作了 Docker 卷挂载的源路径。 - 修复尝试 1 (代码补丁): 我们手动编辑了
rdagent/utils/env.py,使用platform.system()检查 "Windows" 系统,然后用tempfile.mkdtemp()来创建一个兼容的临时目录。此修改似乎未生效,表明变更没有被成功加载。 - 修复尝试 2 (清理缓存): 怀疑是 Python 字节码缓存 (
.pyc文件) 的问题,我们手动删除了rdagentsite-packages 文件夹下所有的__pycache__目录。这同样没有解决问题。 - 修复尝试 3 (直接执行): 我们遇到了严重的 PowerShell 路径解析问题。
conda activate命令未能可靠地将环境的 Python 路径置于 PATH 的最前。尝试通过更直接的方式(python -m rdagent.app.cli...或使用 conda 环境python.exe的绝对路径)运行rdagent,导致了ModuleNotFoundError或其他的can't open file错误。 - 最终状态: 核心问题似乎是
rdagent的启动和环境管理代码对于标准的 Windows/PowerShell/Conda 组合不够健壮。所有命令都静默退出的现象,说明在主应用逻辑(及 Docker 交互)开始之前,某个前置检查就失败了。
我们还创建了一个临时的诊断脚本 (direct_run.py) 来绕过命令行界面,直接实例化核心应用逻辑,但由于持续的路径和环境问题,即使是运行这个简单的脚本也未能成功。
Status Update: Resolved
Root Cause Analysis
The core problem was traced back to the normalize_volumes function within rdagent/utils/env.py. On Windows, the function was incorrectly converting container-side POSIX paths (e.g., /workspace/qlib_workspace/) into absolute Windows paths (e.g., C:\workspace\...).
This resulted in an invalid volume mount string for Docker, such as C:\Users\...\work_dir:C:\workspace\...:rw, which contains too many colons and triggered Docker's mount denied error.
The Fix
A patch was applied directly to the normalize_volumes function to differentiate between host and container path handling:
- Host Paths: Are now always resolved to their absolute native filesystem path (e.g.,
C:\Users\...). - Container Paths: Are now explicitly converted to POSIX-style paths to prevent misinterpretation by the Docker daemon on Windows.
This ensures the final mount instruction is always correctly formatted, regardless of the host operating system.
In windows, change rdagent/utils/env.py normalize_volumns function like below, it works:
def normalize_volumes(vols: dict[str, str | dict[str, str]], working_dir: str) -> dict:
abs_vols: dict[str, str | dict[str, str]] = {}
# print(f"vols: {vols}")
# print(f"working_dir: {working_dir}")
def to_abs_host(path: str) -> str:
# 主机路径转绝对路径
return os.path.abspath(path)
def to_abs_posix_container(path: str) -> str:
# 容器路径转 POSIX 绝对路径
# 如果已经是绝对路径,直接转为 POSIX
# 如果是相对路径,则拼接 working_dir
if os.path.isabs(path):
return str(PurePosixPath(path))
else:
# working_dir 一定是容器内的绝对路径
return str(PurePosixPath(working_dir) / path)
for lp, vinfo in vols.items():
abs_host = to_abs_host(lp)
if isinstance(vinfo, dict):
vinfo = vinfo.copy()
vinfo["bind"] = to_abs_posix_container(vinfo["bind"])
abs_vols[abs_host] = vinfo
else:
abs_vols[abs_host] = to_abs_posix_container(vinfo)
return abs_vols
In windows, change rdagent/utils/env.py
normalize_volumnsfunction like below, it works:def normalize_volumes(vols: dict[str, str | dict[str, str]], working_dir: str) -> dict: abs_vols: dict[str, str | dict[str, str]] = {} # print(f"vols: {vols}") # print(f"working_dir: {working_dir}") def to_abs_host(path: str) -> str: # 主机路径转绝对路径 return os.path.abspath(path) def to_abs_posix_container(path: str) -> str: # 容器路径转 POSIX 绝对路径 # 如果已经是绝对路径,直接转为 POSIX # 如果是相对路径,则拼接 working_dir if os.path.isabs(path): return str(PurePosixPath(path)) else: # working_dir 一定是容器内的绝对路径 return str(PurePosixPath(working_dir) / path) for lp, vinfo in vols.items(): abs_host = to_abs_host(lp) if isinstance(vinfo, dict): vinfo = vinfo.copy() vinfo["bind"] = to_abs_posix_container(vinfo["bind"]) abs_vols[abs_host] = vinfo else: abs_vols[abs_host] = to_abs_posix_container(vinfo) return abs_vols
thanks. it solve the problem.