LZB issues

Results 3 issues of

LZB

13B用lora模型训练数据，90G的显存还报超出了显存

使用torchrun --nproc_per_node 1 train.py 单块GPU，可以下载预训练模型，但跑一会就出错了 ![008d22342783bf309466e08a69153c4](https://github.com/LianjiaTech/BELLE/assets/68055934/8cb27571-95f9-4360-b208-e634a922a20c) ![0ec52adcf9c211e04c8fbf51dc2e90a](https://github.com/LianjiaTech/BELLE/assets/68055934/2e1224fe-c8b4-45aa-b6f8-a951f062e395) 使用torchrun --nproc_per_node 8 train.py 多块GPU，下载预训练模型的时候cuda就超显存了，我的是90G左右的显存，这都不够训练的吗？

全量训练的时候修改什么参数，能降低显存的使用

4块A100，160G的显存，训练数据都是报显存不足，我怎么调整一下参数呢，或者修改哪里能让数据训练起来 # #FT torchrun --nproc_per_node 4 /home/jovyan/vol-1/BELLE/train/src/train.py \ --model_name_or_path ${model_name_or_path} \ --llama \ --deepspeed configs/deepspeed_config_stage3.json \ --train_file ${train_file} \ --validation_file ${validation_file} \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 1...

BELLE-LLaMA-EXT-13B 模型解码失败

Writing final chunks... Error: Checksums do not match. The file may be corrupted. 下载的BELLE-LLaMA-EXT-13B模型MD5值是一样的，也是用的/path/to_original_llama_7B/consolidated.00.pth进行解码，但是出现了上述问题，检查解码完的md5值和huggingface上不一样，这个是什么原因，我该怎么排查一下呢