new out of memory issue (possible regression?)
Describe the bug
Trying to load llama-30B-q4 on Win 10, 64Gb ram, GV100 32Gb. I had gotten similar errors in the past but only intermittently, now I cannot load 30B at all. Feels like this is likely a bug even if it's not a regression but rather caused by some change on my end as the loading could use a disk cache if there isn't enough ram (right?)
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
Unsure. Just running the program and trying to load 30b same as yesterday.
Screenshot
No response
Logs
Starting the web UI...
Loading llama-30b-hf...
Loading model ...
Traceback (most recent call last):
File "d:\ooba\text-generation-webui\server.py", line 241, in <module>
shared.model, shared.tokenizer = load_model(shared.model_name)
File "d:\ooba\text-generation-webui\modules\models.py", line 101, in load_model
model = load_quantized(model_name)
File "d:\ooba\text-generation-webui\modules\GPTQ_loader.py", line 64, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
File "d:\ooba\text-generation-webui\repositories\GPTQ-for-LLaMa\llama.py", line 245, in load_quant
model.load_state_dict(torch.load(checkpoint))
File "d:\ooba\installer_files\env\lib\site-packages\torch\serialization.py", line 789, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "d:\ooba\installer_files\env\lib\site-packages\torch\serialization.py", line 1131, in _load
result = unpickler.load()
File "d:\ooba\installer_files\env\lib\site-packages\torch\serialization.py", line 1101, in persistent_load
load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "d:\ooba\installer_files\env\lib\site-packages\torch\serialization.py", line 1079, in load_tensor
storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage).storage().untyped()
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 425984001 bytes.
System Info
EPYC 7542
64Gb ram
GV100
Win10
Just successfully launched, changed nothing just ran the script again on the off chance it would work. ¯\_(ツ)_/¯
Got this too, if anyone finds a fix lmk
You will get this error if you have a ramdisk or virturaldisk running prior to loading your model. You can set up your ramdisk after you load the model though.
Im sorry but I havent heard of either
Oh maybe it's something different then the problem I was having 🤷♂️
I have to use swap space on my drive, because i only have 8GB Ram, also loading any model in regular mode results in python just crashing
Duplicate of https://github.com/oobabooga/text-generation-webui/issues/492
Closing as the problem has been solved for the original poster
fwiw I still occasionally get this, it was always intermittent and it seems to be better now, but it does still happen from time to time, i just bumped to the latest commit, i'll keep this thread posted if it happens again