lollms-webui The model does not generate a response.

The model does not generate a response.

Open edyapd opened this issue 11 months ago • 0 comments

At first glance, it looks like a grand project. Unfortunately, I couldn't get it to work in practice. Yesterday, I spent half a day trying to set it up from scratch in various configurations. Today, I updated the driver for my graphics card because I noticed it was installing CUDA 12.3 while I only had support for CUDA 12.2. Currently, I have the latest video driver supporting CUDA 12.4, but it doesn't help. I'm not a programmer, so I don't know how to deal with the errors that keep popping up. Sadly, for the time being, I have to refrain from using this product.

Expected Behavior

I expect that after installation and configuration, I will receive a response from the model.

Current Behavior

Starting message generation by lollms Text generation requested by client: QiheU37Pd18z5tUlAAAF Started generation task Received message : Hi INFO: ::1:53795 - "GET /user_infos/default_user.svg HTTP/1.1" 500 Internal Server Error ERROR: Exception in ASGI application Traceback (most recent call last): File "D:\lollms\installer_files\lollms_env\Lib\site-packages\starlette\responses.py", line 326, in call stat_result = await anyio.to_thread.run_sync(os.stat, self.path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\anyio_backends_asyncio.py", line 2144, in run_sync_in_worker_thread return await future ^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\anyio_backends_asyncio.py", line 851, in run result = context.run(func, *args) ^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: [WinError 2] Не удается найти указанный файл: 'D:\lollms\personal_data\user_infos\default_user.svg'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "D:\lollms\installer_files\lollms_env\Lib\site-packages\uvicorn\protocols\http\h11_impl.py", line 408, in run_asgi result = await app( # type: ignore[func-returns-value] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 69, in call return await self.app(scope, receive, send) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\engineio\async_drivers\asgi.py", line 67, in call await self.other_asgi_app(scope, receive, send) File "D:\lollms\installer_files\lollms_env\Lib\site-packages\fastapi\applications.py", line 1054, in call await super().call(scope, receive, send) File "D:\lollms\installer_files\lollms_env\Lib\site-packages\starlette\applications.py", line 123, in call await self.middleware_stack(scope, receive, send) File "D:\lollms\installer_files\lollms_env\Lib\site-packages\starlette\middleware\errors.py", line 186, in call raise exc File "D:\lollms\installer_files\lollms_env\Lib\site-packages\starlette\middleware\errors.py", line 164, in call await self.app(scope, receive, send) File "D:\lollms\installer_files\lollms_env\Lib\site-packages\starlette\middleware\exceptions.py", line 62, in call await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "D:\lollms\installer_files\lollms_env\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app raise exc File "D:\lollms\installer_files\lollms_env\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "D:\lollms\installer_files\lollms_env\Lib\site-packages\starlette\routing.py", line 758, in call await self.middleware_stack(scope, receive, send) File "D:\lollms\installer_files\lollms_env\Lib\site-packages\starlette\routing.py", line 778, in app await route.handle(scope, receive, send) File "D:\lollms\installer_files\lollms_env\Lib\site-packages\starlette\routing.py", line 299, in handle await self.app(scope, receive, send) File "D:\lollms\installer_files\lollms_env\Lib\site-packages\starlette\routing.py", line 79, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "D:\lollms\installer_files\lollms_env\Lib\site-packages\starlette_exception_handler.py", line 64, in wrapped_app raise exc File "D:\lollms\installer_files\lollms_env\Lib\site-packages\starlette_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "D:\lollms\installer_files\lollms_env\Lib\site-packages\starlette\routing.py", line 77, in app await response(scope, receive, send) File "D:\lollms\installer_files\lollms_env\Lib\site-packages\starlette\responses.py", line 329, in call raise RuntimeError(f"File at path {self.path} does not exist.") RuntimeError: File at path D:\lollms\personal_data\user_infos\default_user.svg does not exist. INFO: ::1:53796 - "GET /assets/loading-c3bdfb0a.svg HTTP/1.1" 200 OK warmup for generating up to 3981 tokens INFO: ::1:53796 - "GET /assets/Roboto-Regular-7277cfb8.ttf HTTP/1.1" 200 OK Traceback (most recent call last): File "D:\lollms\lollms-webui\zoos\bindings_zoo\hugging_face_init.py", line 694, in generate self.model.generate( File "D:\lollms\installer_files\lollms_env\Lib\site-packages\auto_gptq\modeling_base.py", line 447, in generate return self.model.generate(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\transformers\generation\utils.py", line 1592, in generate return self.sample( ^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\transformers\generation\utils.py", line 2696, in sample outputs = self( ^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1176, in forward outputs = self.model( ^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1019, in forward layer_outputs = decoder_layer( ^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 740, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( ^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\auto_gptq\nn_modules\fused_llama_attn.py", line 72, in forward cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\lollms\installer_files\lollms_env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: LlamaLinearScalingRotaryEmbedding.forward() missing 1 required positional argument: 'position_ids'

Finished executing the generation

Done Generation

╔══════════════════════════════════════════════════╗ ║ Done ║ ╚══════════════════════════════════════════════════╝ INFO: ::1:53796 - "GET /assets/ok-a0b56451.svg HTTP/1.1" 200 OK

Steps to Reproduce

Please provide detailed steps to reproduce the issue.

Ran win_install.bat (didn’t notice any errors during installation)
Installed the Hugging Face bind
Installed the DeepMagic-Coder-7b-GPTQ model. I don't know how helpful the message I received is: Requested updating of setting model_name to DeepMagic-Coder-7b-GPTQ Changing model to: DeepMagic-Coder-7b-GPTQ Building model DeepMagic-Coder-7b-GPTQ ------- Cuda VRAM usage ------- {'nb_gpus': 1, 'gpu_0_total_vram': 25769803776, 'gpu_0_used_vram': 8388608, 'gpu_0_model': 'Tesla P40'} Cleared cache ------- Cuda VRAM usage ------- {'nb_gpus': 1, 'gpu_0_total_vram': 25769803776, 'gpu_0_used_vram': 8388608, 'gpu_0_model': 'Tesla P40'} Creating tokenizer D:\lollms\personal_data\models\gptq\DeepMagic-Coder-7b-GPTQ Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Recovering generation config D:\lollms\personal_data\models\gptq\DeepMagic-Coder-7b-GPTQ Creating model D:\lollms\personal_data\models\gptq\DeepMagic-Coder-7b-GPTQ Using device map: auto CUDA extension not installed.

CUDA extension not installed.

Exllamav2 kernel is not installed, reset disable_exllamav2 to True. This may because you installed auto_gptq using a pre-build wheel on Windows, in which exllama_kernels are not compiled. To use exllama_kernels to further speedup inference, you can re-install auto_gptq from source.

CUDA kernels for auto_gptq are not installed, this will result in very slow inference speed. This may because:

You disable CUDA extensions compilation by setting BUILD_CUDA_EXT=0 when install auto_gptq from source.
You are using pytorch without CUDA support.
CUDA and nvcc are not installed in your device.

D:\lollms\installer_files\lollms_env\Lib\site-packages\transformers\modeling_utils.py:4193: FutureWarning: _is_quantized_training_enabled is going to be deprecated in transformers 4.39.0. Please use model.hf_quantizer.is_trainable instead warnings.warn( The cos_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the LlamaAttention class The sin_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the LlamaAttention class Skipping module injection for FusedLlamaMLPForQuantizedModel as currently not supported with use_triton=False.

Couldn't force exllama max imput size. This is a model that doesn't support exllama. Model loaded successfully

New model OK

Possible Solution

I'm not a programmer, so I can't give any solutions

Context

Motherboard: Machinist X99 Intel Xeon CPU E5-2698 v3 RAM 32GB Video card: NVIDIA Tesla P40 24GB, Driver version: 551.61

Screenshots

Mar 15 '24 05:03 edyapd

lollms-webui lollms-webui copied to clipboard

The model does not generate a response.

Expected Behavior

Current Behavior

Done Generation

Steps to Reproduce

Possible Solution

Context

Screenshots

lollms-webui
lollms-webui copied to clipboard