13b的模型跑起来，需要多少显存资源

Jul 19 '23 03:07 Jonsun-N

30GB左右显存的显卡支持

Jul 19 '23 09:07 liupengfei0324

确认一下，是多张卡加起来就行是吧，不是一张卡的显存必须大于30g吧？

Jul 19 '23 09:07 Jonsun-N

确认一下，是多张卡加起来就行是吧，不是一张卡的显存必须大于30g吧？

应该是单张显卡必须要30G，显存貌似不能叠加，可以考虑量化为int8

Jul 22 '23 07:07 ffabbwl

可以切分到多张卡部署吗，本地测试了下单张24G的3090部署不了，想尝试下多卡

Jul 26 '23 02:07 ImmNaruto

可以切分到多张卡部署吗，本地测试了下单张24G的3090部署不了，想尝试下多卡

参考deepspeed Zero stage 3

Aug 01 '23 07:08 Mewral

可以切分到多张卡部署吗，本地测试了下单张24G的3090部署不了，想尝试下多卡

可以尝试llama.cpp，速度更快，支持多卡。

Aug 15 '23 08:08 jinfengfeng

我A10 双卡，也报不支持多卡错误。可以详细说一下，如何多卡使用吗？

Sep 17 '23 09:09 nuaabuaa07

量化8bit 加载模型，是这样配置吗 ` model = LlamaForCausalLM.from_pretrained(
ziya_model_path,

torch_dtype=torch.float16,

     load_in_8bit=True,                                                      
     device_map="auto",                                                      
 )

`

Sep 25 '23 06:09 nuaabuaa07

量化8bit 加载模型，是这样配置吗 ` model = LlamaForCausalLM.from_pretrained( ziya_model_path,

直接加 load_in_8bit=True 会报错需要使用。需要这样 `python nf4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4") model = LlamaForCausalLM.from_pretrained(
ziya_model_path,
quantization_config=nf4_config,
device_map='auto'
)

`

Sep 25 '23 06:09 nuaabuaa07

设置使用单显卡 export CUDA_VISIBLE_DEVICES=0 & python main.py

Sep 25 '23 07:09 nuaabuaa07

可以使用cpu来运行这个13b模型吗？

Mar 07 '24 06:03 chiugui

量化8bit 加载模型，是这样配置吗 ` model = LlamaForCausalLM.from_pretrained( ziya_model_path,

直接加 load_in_8bit=True 会报错需要使用。需要这样 `python nf4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4") model = LlamaForCausalLM.from_pretrained( ziya_model_path, quantization_config=nf4_config, device_map='auto' )

`

请问，这个是加到那个配置文件中的呢？

Mar 07 '24 06:03 chiugui

ChatLaw ChatLaw copied to clipboard

13b的模型跑起来，需要多少显存资源

torch_dtype=torch.float16,

ChatLaw
ChatLaw copied to clipboard