text-generation-webui icon indicating copy to clipboard operation
text-generation-webui copied to clipboard

LoRA Training CPU not working, llama_tokenize: too many tokens

Open 94bb494nd41f opened this issue 3 years ago • 1 comments

Hey there, i tried to replicate what bublint did, creating a lora based on some documentation, but its simply not working. Is it because i am running a quantized model on CPU or is it just a Bug?

Loading eachadea_ggml-vicuna-7b-1-1...
llama.cpp weights detected: models\eachadea_ggml-vicuna-7b-1-1\ggml-vicuna-7b-1.1-q4_1.bin

llama.cpp: loading model from models\eachadea_ggml-vicuna-7b-1-1\ggml-vicuna-7b-1.1-q4_1.bin
llama_model_load_internal: format     = ggjt v1 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 2048
llama_model_load_internal: n_embd     = 4096
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 32
llama_model_load_internal: n_layer    = 32
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 3 (mostly Q4_1)
llama_model_load_internal: n_ff       = 11008
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size =  59.11 KB
llama_model_load_internal: mem required  = 6612.57 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size  = 1024.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Loading the extension "gallery"... Ok.
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Warning: LoRA training has only currently been validated for LLaMA, OPT, GPT-J, and GPT-NeoX models. (Found model type: LlamaCppModel)
Warning: It is highly recommended you use `--load-in-8bit` for LoRA training.
Loading raw text file dataset...
llama_tokenize: too many tokens
Traceback (most recent call last):
  File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\gradio\routes.py", line 395, in run_predict
    output = await app.get_blocks().process_api(
  File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\gradio\blocks.py", line 1193, in process_api
    result = await self.call_function(
  File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\gradio\blocks.py", line 930, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\gradio\utils.py", line 491, in async_iteration
    return next(iterator)
  File "C:\TCHT\oobabooga_windows\text-generation-webui\modules\training.py", line 262, in do_train
    tokens = shared.tokenizer.encode(raw_text)
  File "C:\TCHT\oobabooga_windows\text-generation-webui\modules\llamacpp_model_alternative.py", line 38, in encode
    return self.model.tokenize(string)
  File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\llama_cpp\llama.py", line 137, in tokenize
    raise RuntimeError(f'Failed to tokenize: text="{text}" n_tokens={n_tokens}')
RuntimeError: Failed to tokenize: text="

I am running this on Windows 10, AMD Ryzen5 3500U, 22 GB RAM in CPU mode.

94bb494nd41f avatar Apr 23 '23 17:04 94bb494nd41f

This is my first issue, some how mine looks different?

94bb494nd41f avatar Apr 23 '23 17:04 94bb494nd41f

LoRAs are not implemented for ggml (llama.cpp) models

oobabooga avatar Apr 24 '23 00:04 oobabooga