RD-Agent RD-Agent fails to run on Windows 11 due to Docker mount path and environment issue

🐛 Bug Description

On a standard Windows 11 machine with Docker Desktop (using the WSL 2 backend) and a clean Miniconda installation, rdagent fails to execute any command (e.g., fin_quant, collect_info). The initial error points to an incorrect Docker volume mount path, and subsequent attempts to fix this reveal deeper issues related to environment path resolution between PowerShell, Conda, and the Python script. The program either fails with a Docker 500 Internal Server Error or exits silently without any error message after printing the initial configuration log.

To Reproduce

Steps to reproduce the behavior:

Environment Setup:
- OS: Windows 11
- Terminal: PowerShell
- Python Management: Miniconda
- Containerization: Docker Desktop (latest version, with WSL 2 backend)
Installation:
- Create a new Conda environment: conda create -n rdagent-env python=3.10
- Activate the environment: conda activate rdagent-env
- Install the package: pip install rdagent
Configuration:
- Create a working directory (e.g., RD-Agent-Work-Folder).
- Inside the directory, create an .env file with the necessary API keys and model configurations.
Execution:
- From the working directory, attempt to run any rdagent command: rdagent fin_quant

Expected Behavior

The rdagent command should correctly initialize, build the necessary Docker container (local_qlib:latest), and proceed with the task execution. The Docker volume mounts should be compatible with the Windows host file system.

Screenshot

The primary error received during multiple attempts:

docker.errors.APIError: 500 Server Error for http+docker://localnpipe/v1.51/containers/create: Internal Server Error ("mount denied:
the source path "/tmp/full:C:\\workspace\\qlib_workspace\\workspace_cache:rw"
too many colons")

After attempting to patch the source code, subsequent errors included ModuleNotFoundError or the program simply exiting silently after printing the configuration log, with no error.

Environment

Note: The rdagent collect_info command fails to run (exits silently), so this information is gathered manually.

Name of current operating system: Windows 11
Processor architecture: x64
Python version (in Conda env): 3.10
RD-Agent version: (The version installed by pip install rdagent)
Docker-py version: 7.1.0
Conda Environment: Clean environment with only rdagent and its dependencies installed.

Additional Notes

This issue appears to stem from a fundamental incompatibility with the Windows environment. The debugging process involved several steps:

Initial Diagnosis: The too many colons error strongly suggests that a hardcoded Linux path (/tmp/full) in rdagent/utils/env.py is being used as a source for a Docker volume mount on a Windows host.
Attempted Fix 1 (Code Patch): We manually edited rdagent/utils/env.py to use platform.system() to check for "Windows" and then use tempfile.mkdtemp() to create a compatible temporary directory. This appeared to have no effect, suggesting the changes were not being picked up.
Attempted Fix 2 (Cache Clearing): Suspecting a Python bytecode caching issue (.pyc files), we manually deleted all __pycache__ directories within the rdagent site-packages folder. This also did not resolve the issue.
Attempted Fix 3 (Direct Execution): We encountered significant path resolution issues with PowerShell. conda activate did not reliably place the environment's Python on the PATH. Attempts to run rdagent via more direct means (python -m rdagent.app.cli... or using the absolute path to the conda env's python.exe) led to ModuleNotFoundError or other can't open file errors.
Final State: The core problem seems to be that rdagent's startup and environment management code is not robust enough for a standard Windows/PowerShell/Conda setup. The silent exit of all commands suggests a failure in a pre-flight check before the main application logic (and Docker interaction) even begins.

A temporary diagnostic script (direct_run.py) was also created to bypass the CLI and directly instantiate the core application logic, but even running this simple script proved impossible due to the persistent path and environment issues.

中文版 / Chinese Version

🐛 Bug 描述

在一台标准的 Windows 11 电脑上，使用 Docker Desktop (WSL 2 后端) 和一个纯净的 Miniconda 环境，rdagent 无法执行任何命令（例如 fin_quant, collect_info）。初始错误指向一个不正确的 Docker 卷挂载路径，后续修复尝试揭示了在 PowerShell、Conda 和 Python 脚本之间存在更深层次的环境路径解析问题。程序最终要么因为 Docker 500 内部服务器错误 而失败，要么在打印出初始配置日志后，没有任何错误信息就直接静默退出。

复现步骤

复现问题的步骤：

环境搭建:

操作系统: Windows 11
终端: PowerShell
Python 管理: Miniconda
容器化: Docker Desktop (最新版, 使用 WSL 2 后端)

安装:

创建新的 Conda 环境: conda create -n rdagent-env python=3.10
激活环境: conda activate rdagent-env
安装包: pip install rdagent

配置:

创建一个工作目录 (例如 RD-Agent-Work-Folder)。
在该目录内，创建一个包含必要 API 密钥和模型配置的 .env 文件。

执行:

在工作目录下，尝试运行任何 rdagent 命令: rdagent fin_quant

预期行为

rdagent 命令应该能够正确初始化，构建所需的 Docker 容器 (local_qlib:latest)，并继续执行任务。Docker 的卷挂载应该与 Windows 主机的文件系统兼容。

截图

在多次尝试中收到的主要错误：

docker.errors.APIError: 500 Server Error for http+docker://localnpipe/v1.51/containers/create: Internal Server Error ("mount denied:
the source path "/tmp/full:C:\\workspace\\qlib_workspace\\workspace_cache:rw"
too many colons")

在尝试修补源代码后，后续的错误包括 ModuleNotFoundError，或者程序在打印配置日志后直接静默退出。

运行环境

注意: rdagent collect_info 命令无法运行（静默退出），因此以下信息为手动收集。

当前操作系统名称: Windows 11
处理器架构: x64
Python 版本 (Conda 环境内): 3.10
RD-Agent 版本: (通过 pip install rdagent 安装的版本)
docker-py 库版本: 7.1.0
Conda 环境: 仅安装了 rdagent 及其依赖的纯净环境。

额外说明

此问题似乎源于与 Windows 环境的根本性不兼容。整个调试过程包含了以下几个步骤：

初步诊断: too many colons (冒号过多) 错误强烈表明，在 rdagent/utils/env.py 中有一个硬编码的 Linux 路径 (/tmp/full) 在 Windows 主机上被用作了 Docker 卷挂载的源路径。
修复尝试 1 (代码补丁): 我们手动编辑了 rdagent/utils/env.py，使用 platform.system() 检查 "Windows" 系统，然后用 tempfile.mkdtemp() 来创建一个兼容的临时目录。此修改似乎未生效，表明变更没有被成功加载。
修复尝试 2 (清理缓存): 怀疑是 Python 字节码缓存 (.pyc 文件) 的问题，我们手动删除了 rdagent site-packages 文件夹下所有的 __pycache__ 目录。这同样没有解决问题。
修复尝试 3 (直接执行): 我们遇到了严重的 PowerShell 路径解析问题。conda activate 命令未能可靠地将环境的 Python 路径置于 PATH 的最前。尝试通过更直接的方式（python -m rdagent.app.cli... 或使用 conda 环境 python.exe 的绝对路径）运行 rdagent，导致了 ModuleNotFoundError 或其他的 can't open file 错误。
最终状态: 核心问题似乎是 rdagent 的启动和环境管理代码对于标准的 Windows/PowerShell/Conda 组合不够健壮。所有命令都静默退出的现象，说明在主应用逻辑（及 Docker 交互）开始之前，某个前置检查就失败了。

我们还创建了一个临时的诊断脚本 (direct_run.py) 来绕过命令行界面，直接实例化核心应用逻辑，但由于持续的路径和环境问题，即使是运行这个简单的脚本也未能成功。

Command Prompt.txt

env.py.txt

Jul 11 '25 17:07 Zizhao-HUANG

Status Update: Resolved

Root Cause Analysis

The core problem was traced back to the normalize_volumes function within rdagent/utils/env.py. On Windows, the function was incorrectly converting container-side POSIX paths (e.g., /workspace/qlib_workspace/) into absolute Windows paths (e.g., C:\workspace\...).

This resulted in an invalid volume mount string for Docker, such as C:\Users\...\work_dir:C:\workspace\...:rw, which contains too many colons and triggered Docker's mount denied error.

The Fix

A patch was applied directly to the normalize_volumes function to differentiate between host and container path handling:

Host Paths: Are now always resolved to their absolute native filesystem path (e.g., C:\Users\...).
Container Paths: Are now explicitly converted to POSIX-style paths to prevent misinterpretation by the Docker daemon on Windows.

This ensures the final mount instruction is always correctly formatted, regardless of the host operating system.

Jul 11 '25 18:07 Zizhao-HUANG

In windows, change rdagent/utils/env.py normalize_volumns function like below, it works:


def normalize_volumes(vols: dict[str, str | dict[str, str]], working_dir: str) -> dict:
    abs_vols: dict[str, str | dict[str, str]] = {}

    # print(f"vols: {vols}")
    # print(f"working_dir: {working_dir}")

    def to_abs_host(path: str) -> str:
        # 主机路径转绝对路径
        return os.path.abspath(path)

    def to_abs_posix_container(path: str) -> str:
        # 容器路径转 POSIX 绝对路径
        # 如果已经是绝对路径，直接转为 POSIX
        # 如果是相对路径，则拼接 working_dir
        if os.path.isabs(path):
            return str(PurePosixPath(path))
        else:
            # working_dir 一定是容器内的绝对路径
            return str(PurePosixPath(working_dir) / path)

    for lp, vinfo in vols.items():
        abs_host = to_abs_host(lp)
        if isinstance(vinfo, dict):
            vinfo = vinfo.copy()
            vinfo["bind"] = to_abs_posix_container(vinfo["bind"])
            abs_vols[abs_host] = vinfo
        else:
            abs_vols[abs_host] = to_abs_posix_container(vinfo)
    return abs_vols

Jul 15 '25 15:07 lemondy

In windows, change rdagent/utils/env.py normalize_volumns function like below, it works:


def normalize_volumes(vols: dict[str, str | dict[str, str]], working_dir: str) -> dict:
    abs_vols: dict[str, str | dict[str, str]] = {}

    # print(f"vols: {vols}")
    # print(f"working_dir: {working_dir}")

    def to_abs_host(path: str) -> str:
        # 主机路径转绝对路径
        return os.path.abspath(path)

    def to_abs_posix_container(path: str) -> str:
        # 容器路径转 POSIX 绝对路径
        # 如果已经是绝对路径，直接转为 POSIX
        # 如果是相对路径，则拼接 working_dir
        if os.path.isabs(path):
            return str(PurePosixPath(path))
        else:
            # working_dir 一定是容器内的绝对路径
            return str(PurePosixPath(working_dir) / path)

    for lp, vinfo in vols.items():
        abs_host = to_abs_host(lp)
        if isinstance(vinfo, dict):
            vinfo = vinfo.copy()
            vinfo["bind"] = to_abs_posix_container(vinfo["bind"])
            abs_vols[abs_host] = vinfo
        else:
            abs_vols[abs_host] = to_abs_posix_container(vinfo)
    return abs_vols

thanks. it solve the problem.

Sep 26 '25 08:09 arcayi