text-generation-webui CUDA error when loading LLaMa model in Ubuntu WSL

CUDA error when loading LLaMa model in Ubuntu WSL

Open TalhaErenY opened this issue 1 year ago • 2 comments

Describe the bug

Following the user guide for how to set up WSL, the WSL CUDA driver setup by Nvidia, and the regular setup, I get an error by bitsandbytes when trying to startup, "CUDA extension not installed". Differently from other issues opened, my libcuda is detected, the failure happens afterwards. I already followed the issue given in the main github page (copying the cpu.so etc) and none of them helped.

Is there an existing issue for this?

[x] I have searched the existing issues

Reproduction

python server.py --load-in-4bit --model llama-7b after following the guide on installing

Screenshot

No response

Logs

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /home/eren/miniconda3/envs/textgen/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /home/eren/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
Loading llama-7b...
CUDA extension not installed.
Traceback (most recent call last):
  File "/home/eren/miniconda3/envs/textgen/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 259, in hf_raise_for_status
    response.raise_for_status()
  File "/home/eren/miniconda3/envs/textgen/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/models/llama-7b/resolve/main/config.json

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/eren/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/utils/hub.py", line 409, in cached_file
    resolved_file = hf_hub_download(
  File "/home/eren/miniconda3/envs/textgen/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/eren/miniconda3/envs/textgen/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1134, in hf_hub_download
    metadata = get_hf_file_metadata(
  File "/home/eren/miniconda3/envs/textgen/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 120, in _inner_fn
    return fn(*args, **kwargs)
  File "/home/eren/miniconda3/envs/textgen/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1475, in get_hf_file_metadata
    hf_raise_for_status(r)
  File "/home/eren/miniconda3/envs/textgen/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 291, in hf_raise_for_status
    raise RepositoryNotFoundError(message, response) from e
huggingface_hub.utils._errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-6415c073-6304906b59b631b6055761ac)

Repository Not Found for url: https://huggingface.co/models/llama-7b/resolve/main/config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated.
Invalid username or password.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/eren/ChatAI/text-generation-webui/server.py", line 236, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/eren/ChatAI/text-generation-webui/modules/models.py", line 100, in load_model
    model = load_quantized(model_name)
  File "/home/eren/ChatAI/text-generation-webui/modules/GPTQ_loader.py", line 55, in load_quantized
    model = load_quant(str(path_to_model), str(pt_path), shared.args.gptq_bits)
  File "/home/eren/ChatAI/text-generation-webui/repositories/GPTQ-for-LLaMa/llama.py", line 221, in load_quant
    config = LlamaConfig.from_pretrained(model)
  File "/home/eren/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/configuration_utils.py", line 546, in from_pretrained
    config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eren/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/configuration_utils.py", line 573, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/home/eren/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/configuration_utils.py", line 628, in _get_config_dict
    resolved_config_file = cached_file(
  File "/home/eren/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/utils/hub.py", line 424, in cached_file
    raise EnvironmentError(
OSError: models/llama-7b is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

System Info

Windows 10 latest update, RTX2060 (laptop), WSL Ubuntu (latest version, non LTS).

Mar 18 '23 13:03 TalhaErenY

You can't use hugging face without generating a login token. You have to d/l those files manually.

Mar 18 '23 15:03 Ph0rk0z

they show up in the folder fine, I also had a local download but had issues transferring it over to my WSL session. Is there a difference between the files I download manually from their website or otherwise vs the python download-model.py method? I will redownload manually if that is the case.

Mar 18 '23 15:03 TalhaErenY

try with this instead python server.py --load-in-4bit --model llama-7b-hf because you are using python server.py --load-in-4bit --model llama-7b and your model is not named that way.

Mar 18 '23 18:03 CrazyKrow

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /home/eren/miniconda3/envs/textgen/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /home/eren/miniconda3/envs/textgen/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
Loading llama-7b-hf...
CUDA extension not installed.
Loading model ...
Done.
Traceback (most recent call last):
  File "/home/eren/ChatAI/text-generation-webui/server.py", line 236, in <module>
    shared.model, shared.tokenizer = load_model(shared.model_name)
  File "/home/eren/ChatAI/text-generation-webui/modules/models.py", line 163, in load_model
    tokenizer = AutoTokenizer.from_pretrained(Path(f"models/{shared.model_name}/"))
  File "/home/eren/miniconda3/envs/textgen/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 677, in from_pretrained
    raise ValueError(
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

I don't understand this error, though at least it is a new type of error. Do you know what is wrong now exactly? *I did run:

pip install git+https://github.com/huggingface/transformers

Mar 18 '23 18:03 TalhaErenY

you have to rename "LLaMATokenizer" to "LlamaTokenizer" on the tokenizer_config.json from your model folder.

Mar 18 '23 18:03 CrazyKrow

thank you, that has solved it and I can successfully load the model. I seem to have trouble port forwarding WSL however. I used netsh interface portproxy add v4tov4 listenaddress=0.0.0.0 listenport=7860 connectaddress=localhost connectport=7860 as per the guide, but I can't connect to the UI. What am I doing wrong here? I try to load http://127.0.0.1:7860/ and http://0.0.0.0:7860/ neither of which connect.

Mar 18 '23 18:03 TalhaErenY

go to the windows terminal and type ipconfig, then you copy the IPv4 address you see there and paste it on your browser, then type :7860 and hit enter. For example if my IPv4 is 190.160.1.30 then I should type 190.160.1.30:7860.

Mar 18 '23 18:03 CrazyKrow

neither the WSL vEthernet adapter's IPv4 nor the computer's IPv4 work for some reason

Mar 18 '23 18:03 TalhaErenY

I've currently worked around the problem by installing chrome into wsl, this works fine

Mar 18 '23 19:03 TalhaErenY

you have to rename "LLaMATokenizer" to "LlamaTokenizer" on the tokenizer_config.json from your model folder.

How can I do it in linux wsl?

Mar 19 '23 23:03 iChristGit

an easy, windows user friendly way to do it is to either type "explorer.exe ." inside the directory of your models, or to simply browse with the file browser under network on the bottom left (where you'll see your linux install). you can then open the json file with your text editor of choice and edit. the linux way would be to browse to the directory, and call your preferred text editor (like gedit, assuming you installed it).

Mar 19 '23 23:03 TalhaErenY

text-generation-webui text-generation-webui copied to clipboard

CUDA error when loading LLaMa model in Ubuntu WSL

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

text-generation-webui
text-generation-webui copied to clipboard