Fengyh
Fengyh
modified tencentpretrain/utils/constants.py L4 to with open("models/llama_special_tokens_map.json", mode="r", encoding="utf-8") as f:
可以参考: https://github.com/Tencent/TencentPretrain/blob/main/scripts/convert_tencentpretrain_to_llama.py
看起来是环境的问题,你是什么显卡、显卡驱动是啥,bitsandbytes版本是啥?麻烦提供更多的信息~
可以参考:https://github.com/fengyh3/llama_inference
你用的是哪个模型呢?能不能提供更多的信息?
@LymanLiuChina 已修复。https://github.com/fengyh3/llama_inference/tree/main
have you tried to set "gradient_accumulation_steps" greater than 1 in deepspeed config?
可以参考一下这个推理哈~ https://github.com/fengyh3/llama_inference
please refer to: https://github.com/Tencent/TencentPretrain/blob/main/models/llama/7b_config.json
can you show more details about your pretraining? for example running bash. It seems your model config has some problems.