text-generation-webui
text-generation-webui copied to clipboard
LoRA Training CPU not working, llama_tokenize: too many tokens
Hey there, i tried to replicate what bublint did, creating a lora based on some documentation, but its simply not working. Is it because i am running a quantized model on CPU or is it just a Bug?
Loading eachadea_ggml-vicuna-7b-1-1...
llama.cpp weights detected: models\eachadea_ggml-vicuna-7b-1-1\ggml-vicuna-7b-1.1-q4_1.bin
llama.cpp: loading model from models\eachadea_ggml-vicuna-7b-1-1\ggml-vicuna-7b-1.1-q4_1.bin
llama_model_load_internal: format = ggjt v1 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 3 (mostly Q4_1)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 59.11 KB
llama_model_load_internal: mem required = 6612.57 MB (+ 1026.00 MB per state)
llama_init_from_file: kv self size = 1024.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Loading the extension "gallery"... Ok.
Running on local URL: http://127.0.0.1:7860
To create a public link, set `share=True` in `launch()`.
Warning: LoRA training has only currently been validated for LLaMA, OPT, GPT-J, and GPT-NeoX models. (Found model type: LlamaCppModel)
Warning: It is highly recommended you use `--load-in-8bit` for LoRA training.
Loading raw text file dataset...
llama_tokenize: too many tokens
Traceback (most recent call last):
File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\gradio\routes.py", line 395, in run_predict
output = await app.get_blocks().process_api(
File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\gradio\blocks.py", line 1193, in process_api
result = await self.call_function(
File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\gradio\blocks.py", line 930, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\gradio\utils.py", line 491, in async_iteration
return next(iterator)
File "C:\TCHT\oobabooga_windows\text-generation-webui\modules\training.py", line 262, in do_train
tokens = shared.tokenizer.encode(raw_text)
File "C:\TCHT\oobabooga_windows\text-generation-webui\modules\llamacpp_model_alternative.py", line 38, in encode
return self.model.tokenize(string)
File "C:\TCHT\oobabooga_windows\installer_files\env\lib\site-packages\llama_cpp\llama.py", line 137, in tokenize
raise RuntimeError(f'Failed to tokenize: text="{text}" n_tokens={n_tokens}')
RuntimeError: Failed to tokenize: text="
I am running this on Windows 10, AMD Ryzen5 3500U, 22 GB RAM in CPU mode.
This is my first issue, some how mine looks different?
LoRAs are not implemented for ggml (llama.cpp) models