guoyaoming
Results
2
issues of
guoyaoming
model = AutoModelForCausalLM.from_pretrained( "baichuan-inc/Baichuan-13B-Chat", load_in_8bit=True, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True ) 加载int8的方式成功加载模型,刚加载完模型显存占用15G左右,但是只要一直对话,显存占用一直往上飙,不知道啥原因?
INFO 05-15 11:04:11 [model_runner.py:1110] Starting to load model ../Qwen2.5-7B-Instruct... Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00