text-generation-webui new out of memory issue (possible regression?)

Describe the bug

Trying to load llama-30B-q4 on Win 10, 64Gb ram, GV100 32Gb. I had gotten similar errors in the past but only intermittently, now I cannot load 30B at all. Feels like this is likely a bug even if it's not a regression but rather caused by some change on my end as the loading could use a disk cache if there isn't enough ram (right?)

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Unsure. Just running the program and trying to load 30b same as yesterday.

Screenshot

No response

Logs

Starting the web UI...
Loading llama-30b-hf...
Loading model ...
Traceback (most recent call last):
  File "d:\ooba\text-generation-webui\server.py", line 241, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "d:\ooba\text-generation-webui\modules\models.py", line 101, in load_model
    model = load_quantized(model_name)
  File "d:\ooba\text-generation-webui\modules\GPTQ_loader.py", line 64, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
  File "d:\ooba\text-generation-webui\repositories\GPTQ-for-LLaMa\llama.py", line 245, in load_quant
    model.load_state_dict(torch.load(checkpoint))
  File "d:\ooba\installer_files\env\lib\site-packages\torch\serialization.py", line 789, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "d:\ooba\installer_files\env\lib\site-packages\torch\serialization.py", line 1131, in _load
    result = unpickler.load()
  File "d:\ooba\installer_files\env\lib\site-packages\torch\serialization.py", line 1101, in persistent_load
    load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "d:\ooba\installer_files\env\lib\site-packages\torch\serialization.py", line 1079, in load_tensor
    storage = zip_file.get_storage_from_record(name, numel, torch.UntypedStorage).storage().untyped()
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 425984001 bytes.

System Info

EPYC 7542
64Gb ram
GV100
Win10

Mar 22 '23 07:03 thot-experiment

Just successfully launched, changed nothing just ran the script again on the off chance it would work. ¯\_(ツ)_/¯

Mar 22 '23 07:03 thot-experiment

Got this too, if anyone finds a fix lmk

Mar 22 '23 08:03 PeterDaGrape

You will get this error if you have a ramdisk or virturaldisk running prior to loading your model. You can set up your ramdisk after you load the model though.

Mar 22 '23 11:03 RandomInternetPreson

Im sorry but I havent heard of either

Mar 22 '23 11:03 PeterDaGrape

Oh maybe it's something different then the problem I was having 🤷‍♂️

Mar 22 '23 12:03 RandomInternetPreson

I have to use swap space on my drive, because i only have 8GB Ram, also loading any model in regular mode results in python just crashing

Mar 22 '23 12:03 PeterDaGrape

Duplicate of https://github.com/oobabooga/text-generation-webui/issues/492

Closing as the problem has been solved for the original poster

Mar 29 '23 02:03 oobabooga

fwiw I still occasionally get this, it was always intermittent and it seems to be better now, but it does still happen from time to time, i just bumped to the latest commit, i'll keep this thread posted if it happens again

Mar 30 '23 10:03 thot-experiment