InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

DummyScheduler issue while fine-tuning with LoRA

Open ff151 opened this issue 1 year ago • 3 comments

我按照以下的文档进行 fine-tune the LoRA ,遇到DummyScheduler的错误,请问应该怎么fix? https://github.com/OpenGVLab/InternVL/blob/main/document/how_to_finetune_internvl_chat_v1_2_on_a_custom_dataset.md

运行的命令: CUDA_VISIBLE_DEVICES=0,1 sh shell/hermes2_yi34b/internvl_chat_v1_2_hermes2_yi34b_448_res_finetune_continue_lora.sh

遇到的问题: File "/root/miniconda3/lib/python3.9/site-packages/accelerate/accelerator.py", line 1468, in _prepare_deepspeed raise ValueError( ValueError: You cannot create a DummyScheduler without specifying a scheduler in the config file. ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 110710) of binary: /root/miniconda3/bin/python

ff151 avatar Jun 20 '24 03:06 ff151

能否提供一下您的环境信息,比如transformers, accelerate的版本之类的

czczup avatar Jul 31 '24 07:07 czczup

能否提供一下您的环境信息,比如transformers, accelerate的版本之类的

accelerate 0.21.0 aiofiles 23.2.1 aiohttp 3.9.4 aiosignal 1.3.1 aiostream 0.5.2 altair 5.3.0 annotated-types 0.6.0 anyio 4.3.0 appdirs 1.4.4 async-timeout 4.0.3 attrs 23.2.0 bitsandbytes 0.42.0 certifi 2024.2.2 charset-normalizer 3.3.2 click 8.1.7 contourpy 1.2.0 cycler 0.12.1 decord 0.6.0 deepspeed 0.13.5 docker-pycreds 0.4.0 einops 0.6.1 einops-exts 0.0.4 exceptiongroup 1.2.0 fastapi 0.110.0 ffmpy 0.3.2 filelock 3.13.3 flash-attn 2.5.6 fonttools 4.50.0 frozenlist 1.4.1 fsspec 2024.3.1 gitdb 4.0.11 GitPython 3.1.43 gradio 4.16.0 gradio_client 0.8.1 grpclib 0.4.7 h11 0.14.0 h2 4.1.0 hjson 3.1.0 hpack 4.0.0 httpcore 0.17.3 httpx 0.24.0 huggingface-hub 0.22.2 hyperframe 6.0.1 idna 3.6 imageio 2.34.2 importlib_resources 6.4.0 Jinja2 3.1.3 joblib 1.3.2 jsonschema 4.21.1 jsonschema-specifications 2023.12.1 kiwisolver 1.4.5 markdown-it-py 3.0.0 markdown2 2.4.13 MarkupSafe 2.1.5 matplotlib 3.8.3 mdurl 0.1.2 modal 0.62.68 modal-client 0.62.68 mpmath 1.3.0 multidict 6.0.5 networkx 3.2.1 ninja 1.11.1.1 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.18.1 nvidia-nvjitlink-cu12 12.4.99 nvidia-nvtx-cu12 12.1.105 opencv-python 4.10.0.84 orjson 3.10.0 packaging 24.0 pandas 2.2.1 peft 0.10.0 pillow 10.2.0 pip 24.0 protobuf 4.25.3 psutil 5.9.8 py-cpuinfo 9.0.0 pydantic 2.6.4 pydantic_core 2.16.3 pydub 0.25.1 Pygments 2.17.2 pynvml 11.5.0 pyparsing 3.1.2 python-dateutil 2.9.0.post0 python-multipart 0.0.9 pytz 2024.1 PyYAML 6.0.1 referencing 0.34.0 regex 2023.12.25 requests 2.31.0 rich 13.7.1 rpds-py 0.18.0 ruff 0.3.4 safetensors 0.4.2 scikit-learn 1.2.2 scipy 1.12.0 semantic-version 2.10.0 sentencepiece 0.1.99 sentry-sdk 1.44.0 setproctitle 1.3.3 setuptools 68.2.2 shellingham 1.5.4 shortuuid 1.0.13 sigtools 4.0.1 six 1.16.0 smmap 5.0.1 sniffio 1.3.1 starlette 0.36.3 svgwrite 1.4.3 sympy 1.12 synchronicity 0.6.7 tensorboardX 2.6.2.2 threadpoolctl 3.4.0 timm 0.6.13 tokenizers 0.15.1 toml 0.10.2 tomlkit 0.12.0 toolz 0.12.1 torch 2.1.2 torchvision 0.16.2 tqdm 4.66.2 transformers 4.37.2 triton 2.1.0 typer 0.9.4 typer-cli 0.12.0 typer-slim 0.12.0 types-certifi 2021.10.8.3 types-toml 0.10.8.20240310 typing_extensions 4.10.0 tzdata 2024.1 urllib3 2.2.1 uvicorn 0.29.0 wandb 0.16.5 watchfiles 0.21.0 wavedrom 2.0.3.post3 websockets 11.0.3 wheel 0.41.2 yarl 1.9.4

HhhjHu avatar Jul 31 '24 08:07 HhhjHu

same problem, just update your accelerate==0.33.0 👍

torinchen avatar Aug 14 '24 08:08 torinchen