GPT-SoVITS icon indicating copy to clipboard operation
GPT-SoVITS copied to clipboard

Windows 24H2 update causes slow inference / windows 24H2更新导致推理慢

Open XianYue0125 opened this issue 7 months ago • 3 comments

📌 问题概要 / Summary 在 Windows 11 Version 24H2 系统下,使用 RTX 4090 运行 GPT-SoVITS v2/v3 推理任务 时,GPU 利用率严重不足,性能大幅下降。而在 Linux 系统 或将系统 降级为 23H2 后,性能恢复正常,表明此问题可能与 24H2 系统版本相关。

📊 详细测试信息 / Detailed Test Info 项目 内容 GPU 型号 / GPU Model NVIDIA RTX 4090 PyTorch 版本 2.0.0 + cu118(官方 pip 安装) cuDNN 版本 8.7.0 CUDA 驱动版本 11.8 / 12.8 均测试 驱动版本 537.58 / 551.61 / 572.70 / 572.83 均测试,表现一致 操作系统 / OS Windows 11 24H2(出问题) / 23H2(正常) 测试模型 GPT-SoVITS v2 & v3 测试内容 推理任务,未涉及训练 是否使用 AMP 是,推理中包含 autocast、fp16 测试工具 nvidia-smi / Gradio WebUI 实测 是否 Linux 正常 是,Linux 下完全正常(Ubuntu 20.04 / 22.04)

⚠️ 具体表现 / Behavior GPU 利用率不足(低于 30%) 推理速度比预期慢(比 RTX 4070 还慢) 在 同样代码、同样模型、同样驱动版本 下: Linux 系统 + RTX 4090:正常,速度极快 Windows 11 23H2 + RTX 4090:正常 Windows 11 24H2 + RTX 4090:异常,严重掉速


✅ 已验证解决方法 / Verified Workaround 将 Windows 系统从 24H2 降级为 23H2 后,GPU 利用率立即恢复,推理速度明显提升。可稳定跑满 Tensor Core 资源。 🔧 GPT-SoVITS(评估是否存在 cuDNN kernel fallback 或 AMP 执行策略差异)

24H2: Image Image

23H2:
Image Image

🐞 Bug Report: RTX 4090 Inference Performance Regression on Windows 11 24H2 📌 Summary When running GPT-SoVITS v2/v3 inference tasks using PyTorch with AMP (autocast, fp16) on RTX 4090 under Windows 11 Version 24H2, GPU utilization is abnormally low (<30%) and inference speed is significantly reduced. In contrast: The same code and model run perfectly on Linux (Ubuntu). The same GPU on Windows 11 23H2 performs normally after a downgrade. Another machine with RTX 4070 + Windows 11 23H2 performs better than 4090 on 24H2. This suggests that the issue is specifically tied to changes in Windows 11 24H2 and possibly how it interacts with NVIDIA drivers or cuDNN optimization logic.


📊 Test Configuration Component Value GPUs tested RTX 4090 (main issue), RTX 4070 (normal) OS versions tested Windows 11 24H2 (problem), 23H2 (normal), Ubuntu 20.04/22.04 (normal) PyTorch version 2.0.0 + cu118 (official pip version) CUDA runtime 11.8 (also tested 12.1) cuDNN version 8.7.0 Drivers tested 537.58, 551.61, 572.70, 572.83 Task type Inference only (no training) Models tested GPT-SoVITS v2, v3 AMP Enabled (autocast + fp16) Tools used nvidia-smi, Python logs, WebUI timing display


⚠️ Symptoms On Windows 11 24H2 + RTX 4090: GPU utilization is consistently low (~30%) Inference is much slower than expected On Windows 11 23H2: Performance is fully recovered GPU runs near 100% utilization On Linux: Everything performs as expected On another machine with RTX 4070 + Windows 11 23H2: Performs better than RTX 4090 on 24H2


✅ Workaround Confirmed After a clean reinstallation using Windows 11 23H2 ISO, inference speed and GPU utilization on RTX 4090 returned to normal levels.

XianYue0125 avatar Apr 09 '25 10:04 XianYue0125

更换PyTorch版本试试呢, torch2.5.1+cu124这样的

XXXXRT666 avatar Apr 09 '25 11:04 XXXXRT666

用任务管理器把Python进程优先级改为高?我也有这个问题这样就能解决

YYuX-1145 avatar Apr 13 '25 07:04 YYuX-1145

已解决,我使用的是inference_webui.py,方法是在inference_webui.py推理的代码开头,

添加如下代码

import psutil
import os

def set_high_priority():
    """把当前 Python 进程设为 HIGH_PRIORITY_CLASS"""
    if os.name != "nt":
        return # 仅 Windows 有效
    p = psutil.Process(os.getpid())
    try:
        p.nice(psutil.HIGH_PRIORITY_CLASS)
        print("已将进程优先级设为 High")
    except psutil.AccessDenied:
        print("权限不足,无法修改优先级(请用管理员运行)")
set_high_priority()

然后可用 nvidia-smi dmon -s u 查看sm这一列,检测变化

实测加这段代码之前,sm被限制在40%左右,限制了功耗,需要推理很久

加了代码之后,sm会升到90%,很快就会推理结束

XianYue0125 avatar Apr 17 '25 08:04 XianYue0125

已解决,我使用的是inference_webui.py,方法是在inference_webui.py推理的代码开头,

添加如下代码

import psutil
import os

def set_high_priority():
    """把当前 Python 进程设为 HIGH_PRIORITY_CLASS"""
    if os.name != "nt":
        return # 仅 Windows 有效
    p = psutil.Process(os.getpid())
    try:
        p.nice(psutil.HIGH_PRIORITY_CLASS)
        print("已将进程优先级设为 High")
    except psutil.AccessDenied:
        print("权限不足,无法修改优先级(请用管理员运行)")
set_high_priority()

然后可用 nvidia-smi dmon -s u 查看sm这一列,检测变化

实测加这段代码之前,sm被限制在40%左右,限制了功耗,需要推理很久

加了代码之后,sm会升到90%,很快就会推理结束

大佬牛逼!

RVC-Boss avatar Jul 10 '25 10:07 RVC-Boss