mergekit
mergekit copied to clipboard
Keep getting an error when making a mixtral model
whenever I make a mistral model using two llama2 13b models, I get the following error message:
Traceback (most recent call last):
File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\username\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "H:\mergekit\Scripts\mergekit-moe.exe_main.py", line 7, in device_map
had weights offloaded to the disk. Please provide an offload_folder
for them. Alternatively, make sure you have safetensors
installed if the model you are using offers the weights in this format.
I have tried to edit modeling_utils.py to no avail. Does anyone know how I can fix this?
I searched online, and it seems that the issue may be due to the large size of the model you're loading, and your memory may not be sufficient to support it. Perhaps you can try adding the parameter "offload_folder='offload_folder'" to the "AutoModelForCausalLM.from_pretrained" function.
I can provide some websites about this problem with you, but these language is Chinese. https://zhuanlan.zhihu.com/p/647755430 https://zhuanlan.zhihu.com/p/605640431 https://blog.csdn.net/qq_40302568/article/details/135028085
wich file is "AutoModelForCausalLM.from_pretrained" in?
in mixtral_moe.py, bro
thanks
This error is because you don't have enough VRAM to inference the base model. There are a couple of things you can do to try to get around it. Adding offload_folder='offload_folder'
might work. I'd recommend trying the --load-in-8bit
or --load-in-4bit
flags. You also might be able to use --device cpu
but that will be very slow.