LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

使用多GPU单节点的example进行尝试的时候失败

Open kian-zhao opened this issue 8 months ago • 2 comments

Reminder

  • [X] I have read the README and searched the existing issues.

Reproduction

我依照说明在conda虚拟坏境中参照必需步骤和Windows用户指南安装llama-factory,然后尝试使用多 GPU LoRA 微调的命令 CUDA_VISIBLE_DEVICES=0,1 llamafactory-cli train examples/lora_multi_gpu/llama3_lora_sft.yaml, 提示错误 llamafactory.cli - Initializing distributed tasks at: 127.0.0.1:29437 failed to create process. 我不清楚是否什么环境有遗漏还是错误

Expected behavior

正常训练

System Info

Win11,cuda=12.1, pytorch=py3.12_cuda12.1_cudnn8_0 accelerate 0.30.1 pyhd8ed1ab_0 conda-forge aiofiles 23.2.1 pypi_0 pypi aiohttp 3.9.5 pypi_0 pypi aiosignal 1.3.1 pypi_0 pypi altair 5.3.0 pypi_0 pypi annotated-types 0.7.0 pypi_0 pypi anyio 4.4.0 pypi_0 pypi attrs 23.2.0 pypi_0 pypi bitsandbytes 0.41.2.post2 pypi_0 pypi blas 1.0 mkl brotli-python 1.0.9 py312hd77b12b_8 bzip2 1.0.8 h2bbff1b_6 ca-certificates 2024.3.11 haa95532_0 certifi 2024.2.2 py312haa95532_0 charset-normalizer 2.0.4 pyhd3eb1b0_0 click 8.1.7 pypi_0 pypi colorama 0.4.6 py312haa95532_0 contourpy 1.2.1 pypi_0 pypi cuda-cccl 12.4.127 0 nvidia cuda-cudart 12.1.105 0 nvidia cuda-cudart-dev 12.1.105 0 nvidia cuda-cupti 12.1.105 0 nvidia cuda-libraries 12.1.0 0 nvidia cuda-libraries-dev 12.1.0 0 nvidia cuda-nvrtc 12.1.105 0 nvidia cuda-nvrtc-dev 12.1.105 0 nvidia cuda-nvtx 12.1.105 0 nvidia cuda-opencl 12.4.127 0 nvidia cuda-opencl-dev 12.4.127 0 nvidia cuda-profiler-api 12.4.127 0 nvidia cuda-runtime 12.1.0 0 nvidia cycler 0.12.1 pypi_0 pypi datasets 2.19.1 pypi_0 pypi dill 0.3.8 pypi_0 pypi dnspython 2.6.1 pypi_0 pypi docstring-parser 0.16 pypi_0 pypi einops 0.8.0 pypi_0 pypi email-validator 2.1.1 pypi_0 pypi expat 2.6.2 hd77b12b_0 fastapi 0.111.0 pypi_0 pypi fastapi-cli 0.0.4 pypi_0 pypi ffmpy 0.3.2 pypi_0 pypi filelock 3.13.1 py312haa95532_0 fire 0.6.0 pypi_0 pypi fonttools 4.52.4 pypi_0 pypi freetype 2.12.1 ha860e81_0 frozenlist 1.4.1 pypi_0 pypi fsspec 2024.3.1 py312haa95532_0 gradio 4.31.5 pypi_0 pypi gradio-client 0.16.4 pypi_0 pypi h11 0.14.0 pypi_0 pypi httpcore 1.0.5 pypi_0 pypi httptools 0.6.1 pypi_0 pypi httpx 0.27.0 pypi_0 pypi huggingface-hub 0.23.2 pypi_0 pypi huggingface_hub 0.23.1 py312haa95532_0 idna 3.7 py312haa95532_0 importlib-resources 6.4.0 pypi_0 pypi intel-openmp 2023.1.0 h59b6b97_46320 jieba 0.42.1 pypi_0 pypi jinja2 3.1.3 py312haa95532_0 joblib 1.4.2 pypi_0 pypi jpeg 9e h2bbff1b_1 jsonschema 4.22.0 pypi_0 pypi jsonschema-specifications 2023.12.1 pypi_0 pypi kiwisolver 1.4.5 pypi_0 pypi lcms2 2.12 h83e58a3_0 lerc 3.0 hd77b12b_0 libcublas 12.1.0.26 0 nvidia libcublas-dev 12.1.0.26 0 nvidia libcufft 11.0.2.4 0 nvidia libcufft-dev 11.0.2.4 0 nvidia libcurand 10.3.5.147 0 nvidia libcurand-dev 10.3.5.147 0 nvidia libcusolver 11.4.4.55 0 nvidia libcusolver-dev 11.4.4.55 0 nvidia libcusparse 12.0.2.55 0 nvidia libcusparse-dev 12.0.2.55 0 nvidia libdeflate 1.17 h2bbff1b_1 libffi 3.4.4 hd77b12b_1 libjpeg-turbo 2.0.0 h196d8e1_0 libnpp 12.0.2.50 0 nvidia libnpp-dev 12.0.2.50 0 nvidia libnvjitlink 12.1.105 0 nvidia libnvjitlink-dev 12.1.105 0 nvidia libnvjpeg 12.1.1.14 0 nvidia libnvjpeg-dev 12.1.1.14 0 nvidia libpng 1.6.39 h8cc25b3_0 libtiff 4.5.1 hd77b12b_0 libuv 1.44.2 h2bbff1b_0 libwebp-base 1.3.2 h2bbff1b_0 llamafactory 0.7.2.dev0 pypi_0 pypi lz4-c 1.9.4 h2bbff1b_1 markdown-it-py 3.0.0 pypi_0 pypi markupsafe 2.1.3 py312h2bbff1b_0 matplotlib 3.9.0 pypi_0 pypi mdurl 0.1.2 pypi_0 pypi mkl 2023.1.0 h6b88ed4_46358 mkl-service 2.4.0 py312h2bbff1b_1 mkl_fft 1.3.8 py312h2bbff1b_0 mkl_random 1.2.4 py312h59b6b97_0 mpmath 1.3.0 py312haa95532_0 multidict 6.0.5 pypi_0 pypi multiprocess 0.70.16 pypi_0 pypi networkx 3.1 py312haa95532_0 nltk 3.8.1 pypi_0 pypi numpy 1.26.4 py312hfd52020_0 numpy-base 1.26.4 py312h4dde369_0 openjpeg 2.4.0 h4fc8c34_0 openssl 3.0.13 h2bbff1b_2 orjson 3.10.3 pypi_0 pypi packaging 24.0 pypi_0 pypi pandas 2.2.2 pypi_0 pypi peft 0.11.1 pypi_0 pypi pillow 10.3.0 py312h2bbff1b_0 pip 24.0 py312haa95532_0 protobuf 5.27.0 pypi_0 pypi psutil 5.9.8 pypi_0 pypi pyarrow 16.1.0 pypi_0 pypi pyarrow-hotfix 0.6 pypi_0 pypi pydantic 2.7.1 pypi_0 pypi pydantic-core 2.18.2 pypi_0 pypi pydub 0.25.1 pypi_0 pypi pygments 2.18.0 pypi_0 pypi pyparsing 3.1.2 pypi_0 pypi pysocks 1.7.1 py312haa95532_0 python 3.12.3 h1d929f7_1 python-dateutil 2.9.0.post0 pypi_0 pypi python-dotenv 1.0.1 pypi_0 pypi python-multipart 0.0.9 pypi_0 pypi pytorch 2.2.2 py3.12_cuda12.1_cudnn8_0 pytorch pytorch-cuda 12.1 hde6ce7c_5 pytorch pytorch-mutex 1.0 cuda pytorch pytz 2024.1 pypi_0 pypi pyyaml 6.0.1 py312h2bbff1b_0 referencing 0.35.1 pypi_0 pypi regex 2024.5.15 pypi_0 pypi requests 2.31.0 py312haa95532_1 rich 13.7.1 pypi_0 pypi rouge-chinese 1.0.3 pypi_0 pypi rpds-py 0.18.1 pypi_0 pypi ruff 0.4.5 pypi_0 pypi safetensors 0.4.3 pypi_0 pypi scipy 1.13.1 pypi_0 pypi semantic-version 2.10.0 pypi_0 pypi sentencepiece 0.2.0 pypi_0 pypi setuptools 69.5.1 py312haa95532_0 shellingham 1.5.4 pypi_0 pypi shtab 1.7.1 pypi_0 pypi six 1.16.0 pypi_0 pypi sniffio 1.3.1 pypi_0 pypi sqlite 3.45.3 h2bbff1b_0 sse-starlette 2.1.0 pypi_0 pypi starlette 0.37.2 pypi_0 pypi sympy 1.12 py312haa95532_0 tbb 2021.8.0 h59b6b97_0 termcolor 2.4.0 pypi_0 pypi tk 8.6.14 h0416ee5_0 tokenizers 0.19.1 pypi_0 pypi tomlkit 0.12.0 pypi_0 pypi toolz 0.12.1 pypi_0 pypi torchaudio 2.2.2 pypi_0 pypi torchvision 0.17.2 pypi_0 pypi tqdm 4.66.4 py312hfc267ef_0 transformers 4.41.1 pypi_0 pypi trl 0.8.6 pypi_0 pypi typer 0.12.3 pypi_0 pypi typing-extensions 4.11.0 py312haa95532_0 typing_extensions 4.11.0 py312haa95532_0 tyro 0.8.4 pypi_0 pypi tzdata 2024.1 pypi_0 pypi ujson 5.10.0 pypi_0 pypi urllib3 2.2.1 py312haa95532_0 uvicorn 0.29.0 pypi_0 pypi vc 14.2 h2eaa2aa_1 vs2015_runtime 14.29.30133 h43f2093_3 watchfiles 0.22.0 pypi_0 pypi websockets 11.0.3 pypi_0 pypi wheel 0.43.0 py312haa95532_0 win_inet_pton 1.1.0 py312haa95532_0 xxhash 3.4.1 pypi_0 pypi xz 5.4.6 h8cc25b3_1 yaml 0.2.5 he774522_0 yarl 1.9.4 pypi_0 pypi zlib 1.2.13 h8cc25b3_1 zstd 1.5.5 hd43e919_2

Others

No response

kian-zhao avatar May 29 '24 03:05 kian-zhao