多GPU推理时,loading 模型,一般是多久?使用八张80G A100 进行并行推理,脚本已经启动20分钟了,还没有开始推理
使用的环境如下 Package Version
accelerate 1.6.0 aiofiles 24.1.0 aiohappyeyeballs 2.6.1 aiohttp 3.11.16 aiosignal 1.3.2 annotated-types 0.7.0 anyio 4.9.0 async-timeout 5.0.1 attrs 25.3.0 beautifulsoup4 4.13.3 certifi 2025.1.31 charset-normalizer 3.4.1 click 8.1.8 coloredlogs 15.0.1 contourpy 1.3.1 cycler 0.12.1 dashscope 1.23.1 decord 0.6.0 diffusers 0.33.1 DistVAE 0.0.0b5 easydict 1.13 einops 0.8.1 exceptiongroup 1.2.2 fastapi 0.115.12 ffmpy 0.5.0 filelock 3.18.0 flash_attn 2.7.4.post1 flatbuffers 25.2.10 fonttools 4.57.0 frozenlist 1.5.0 fsspec 2025.3.2 ftfy 6.3.1 gradio 5.25.0 gradio_client 1.8.0 groovy 0.1.2 h11 0.14.0 httpcore 1.0.8 httpx 0.28.1 huggingface-hub 0.30.2 humanfriendly 10.0 idna 3.10 imageio 2.37.0 imageio-ffmpeg 0.6.0 importlib_metadata 8.6.1 iniconfig 2.1.0 Jinja2 3.1.6 joblib 1.4.2 kiwisolver 1.4.8 lazy_loader 0.4 markdown-it-py 3.0.0 MarkupSafe 3.0.2 matplotlib 3.10.1 mdurl 0.1.2 mpmath 1.3.0 multidict 6.4.3 networkx 3.4.2 numpy 1.26.4 nvidia-cublas-cu12 12.4.5.8 nvidia-cuda-cupti-cu12 12.4.127 nvidia-cuda-nvrtc-cu12 12.4.127 nvidia-cuda-runtime-cu12 12.4.127 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.2.1.3 nvidia-curand-cu12 10.3.5.147 nvidia-cusolver-cu12 11.6.1.9 nvidia-cusparse-cu12 12.3.1.170 nvidia-cusparselt-cu12 0.6.2 nvidia-nccl-cu12 2.21.5 nvidia-nvjitlink-cu12 12.4.127 nvidia-nvtx-cu12 12.4.127 onnxruntime-gpu 1.21.0 opencv-python 4.11.0.86 orjson 3.10.16 packaging 24.2 pandas 2.2.3 pillow 11.1.0 pip 25.0 pluggy 1.5.0 propcache 0.3.1 protobuf 6.30.2 psutil 7.0.0 pycocotools 2.0.8 pydantic 2.11.3 pydantic_core 2.33.1 pydub 0.25.1 Pygments 2.19.1 pyparsing 3.2.3 pytest 8.3.5 python-dateutil 2.9.0.post0 python-multipart 0.0.20 pytz 2025.2 PyYAML 6.0.2 regex 2024.11.6 requests 2.32.3 rich 14.0.0 ruff 0.11.5 safehttpx 0.1.6 safetensors 0.5.3 scikit-image 0.25.2 scikit-learn 1.6.1 scipy 1.15.2 semantic-version 2.10.0 sentencepiece 0.2.0 setuptools 75.8.0 shellingham 1.5.4 six 1.17.0 sniffio 1.3.1 some-package 0.1 soupsieve 2.6 starlette 0.46.1 sympy 1.13.1 threadpoolctl 3.6.0 tifffile 2025.3.30 timm 1.0.15 tokenizers 0.21.1 tomli 2.2.1 tomlkit 0.13.2 torch 2.6.0 torchvision 0.21.0 tqdm 4.67.1 transformers 4.51.2 triton 3.2.0 typer 0.15.2 typing_extensions 4.13.2 typing-inspection 0.4.0 tzdata 2025.2 urllib3 2.4.0 uvicorn 0.34.0 wan 2.1.0 wcwidth 0.2.13 websocket-client 1.8.0 websockets 15.0.1 wheel 0.45.1 xfuser 0.4.3.post3 yarl 1.19.0 yunchang 0.6.3.post1 zipp 3.21.0
Performance depends on your disk I/O. When running on 8 GPUs, 8 processes will simultaneously read model files from disk. Enabling offload will first load the model into CPU memory.
I posted the same issue here: https://github.com/SkyworkAI/SkyReels-V2/issues/28
The code should be updated to load the model files on 1 GPU first and then broadcast. Hunyuan I2V had the same issue on multi-gpu taking 20 minutes (vs 5 minutes on single gpu) and after it was switched to broadcast from one gpu, the load time got reduced to 5 minutes on multi-gpu.
I implemented a fix to this in my fork as well: https://github.com/SkyworkAI/SkyReels-V2/issues/28#issuecomment-2826199716
单GPU脚本启动也是一样,很久都没开始加载模型推理,我是什么参数设置错了吗
How long did you wait? The 720p model is huge so it takes about 10 min even on single GPU.
How long did you wait? The 720p model is huge so it takes about 10 min even on single GPU. 大概等了15分钟这样
然后模型推理显示需要2小时,哭了
Yeah that sounds about right - it's a very heavy model unfortunately. Maybe consider switching down to 540p (it's still very good quality and resolution). There are also two pull requests (mine and chaojie) that I hope will make it into the repo - chaojie's in particular lets you see each chunk of the video in the middle of the render, so that's extremely useful for aborting if the video already is going in the wrong direction rather than waiting the full render time and also lets you reimport the partially finished video from another render.
My process currently for using SkyReels V2 consists of using a version with those two pull requests, starting a long DF video, checking periodically if the video is looking alright, and then restarting with the partially completed video to take a different direction if I don't like where it's going.
Yeah that sounds about right - it's a very heavy model unfortunately. Maybe consider switching down to 540p (it's still very good quality and resolution). There are also two pull requests (mine and chaojie) that I hope will make it into the repo - chaojie's in particular lets you see each chunk of the video in the middle of the render, so that's extremely useful for aborting if the video already is going in the wrong direction rather than waiting the full render time and also lets you reimport the partially finished video from another render.
My process currently for using SkyReels V2 consists of using a version with those two pull requests, starting a long DF video, checking periodically if the video is looking alright, and then restarting with the partially completed video to take a different direction if I don't like where it's going.
感谢解答,我将试试chaojie的代码,另外我降低分辨率到540,我需要下载540P的模型吗? 我目前只下载了720P的模型
Yes it's a new download for the 540p model but it should be automatic from the script (you don't need to download manually - I got confused at first myself trying to figure out where to put the files).
Yes it's a new download for the 540p model but it should be automatic from the script (you don't need to download manually - I got confused at first myself trying to figure out where to put the files).
好滴,感谢,我试试,github里面已经贴有模型链接地址了