InstructGLM ValueError: ChatGLMForConditionalGeneration does not support gradient checkpointing.

ValueError: ChatGLMForConditionalGeneration does not support gradient checkpointing.

Open deepeye opened this issue 1 year ago • 7 comments

训练的时候提示如下错误：

(venv) [xinjingjing@dev-gpu-node-09 InstructGLM]$ python train_lora.py \
>     --dataset_path data/belle \
>     --lora_rank 8 \
>     --per_device_train_batch_size 2 \
>     --gradient_accumulation_steps 1 \
>     --max_steps 52000 \
>     --save_steps 1000 \
>     --save_total_limit 2 \
>     --learning_rate 2e-5 \
>     --fp16 \
>     --remove_unused_columns false \
>     --logging_steps 50 \
>     --output_dir output

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.2/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 112
CUDA SETUP: Loading binary /data/chat/InstructGLM/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda112.so...
Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:12<00:00,  1.51s/it]
Traceback (most recent call last):
  File "/data/chat/InstructGLM/train_lora.py", line 170, in <module>
    main()
  File "/data/chat/InstructGLM/train_lora.py", line 128, in main
    model.gradient_checkpointing_enable()
  File "/data/chat/InstructGLM/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1584, in gradient_checkpointing_enable
    raise ValueError(f"{self.__class__.__name__} does not support gradient checkpointing.")
ValueError: ChatGLMForConditionalGeneration does not support gradient checkpointing.

Apr 19 '23 06:04 deepeye

InstructGLM InstructGLM copied to clipboard

ValueError: ChatGLMForConditionalGeneration does not support gradient checkpointing.

InstructGLM
InstructGLM copied to clipboard