Alpaca-CoT
Alpaca-CoT copied to clipboard
ChatGLM的Finetune推荐命令,使用3090 24G会OOM,代码默认使用8Bit量化同样会导致OOM
Issue 1:
python3 uniform_finetune.py --model_type chatglm --model_name_or_path THUDM/chatglm-6b \
--data alpaca-belle-cot --lora_target_modules query_key_value \
--lora_r 32 --lora_alpha 32 --lora_dropout 0.1 --per_gpu_train_batch_size 2 \
--learning_rate 2e-5 --epochs 1
运行上述命令后会在训练阶段OOM:
RuntimeError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 23.69 GiB total capacity; 22.48 GiB already allocated; 6.06 MiB free; 22.55 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
使用下述命令训练GLM顺利进入训练阶段,尚未发生OOM:
python3 uniform_finetune.py --model_type chatglm --model_name_or_path /workspace/para/chatglm-6b \
--data instinwild_ch --lora_target_modules query_key_value \
--per_gpu_train_batch_size 1 --epochs 1 \
--report_to wandb
训练时占用:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:0A:00.0 Off | N/A |
| 31% 65C P2 305W / 350W | 21916MiB / 24576MiB | 78% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Issue 2:
根据Readme所述,训练GLM时不能使用int8量化,但是finetune代码中没有判断后跳过此类的处理,会导致OOM:
可以注释掉这行,注释后不会在这OOM
chatglm才6b,我32G都是OutOfMemoryError,好神奇,没找到原因