text-generation-webui
text-generation-webui copied to clipboard
Connection timed out
Describe the bug
The issues started after updating today. I load a model, get through a few generations, and then it crashes.
Is there an existing issue for this?
- [X] I have searched the existing issues
Reproduction
Start webui, load a model, and the program always crashes within like two minutes.
Screenshot
No response
Logs
21:59:18-930341 INFO Starting Text generation web UI
21:59:18-933081 INFO Loading the extension "gallery"
Running on local URL: http://127.0.0.1:7860
21:59:24-394228 INFO Loading "zephyr-7b-beta.Q4_K_S.gguf"
21:59:24-466515 INFO llama.cpp weights detected:
"models/zephyr-7b-beta.Q4_K_S.gguf"
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: yes
ggml_init_cublas: CUDA_USE_TENSOR_CORES: no
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1660 Ti, compute capability 7.5, VMM: yes
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from models/zephyr-7b-beta.Q4_K_S.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = huggingfaceh4_zephyr-7b-beta
llama_model_loader: - kv 2: llama.context_length u32 = 32768
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 11: general.file_type u32 = 14
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 2
llama_model_loader: - kv 20: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q4_K: 217 tensors
llama_model_loader: - type q5_K: 8 tensors
llama_model_loader: - type q6_K: 1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 32768
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 4
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff = 14336
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx = 32768
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = Q4_K - Small
llm_load_print_meta: model params = 7.24 B
llm_load_print_meta: model size = 3.86 GiB (4.57 BPW)
llm_load_print_meta: general.name = huggingfaceh4_zephyr-7b-beta
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 2 '</s>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.22 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: CPU buffer size = 70.31 MiB
llm_load_tensors: CUDA0 buffer size = 3877.55 MiB
..................................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 256.00 MiB
llama_new_context_with_model: KV self size = 256.00 MiB, K (f16): 128.00 MiB, V (f16): 128.00 MiB
llama_new_context_with_model: CUDA_Host input buffer size = 13.02 MiB
llama_new_context_with_model: CUDA0 compute buffer size = 164.00 MiB
llama_new_context_with_model: CUDA_Host compute buffer size = 8.00 MiB
llama_new_context_with_model: graph splits (measure): 2
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
Model metadata: {'tokenizer.ggml.padding_token_id': '2', 'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '2', 'general.architecture': 'llama', 'llama.rope.freq_base': '10000.000000', 'llama.context_length': '32768', 'general.name': 'huggingfaceh4_zephyr-7b-beta', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '14336', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '8', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '14'}
Using fallback chat format: None
21:59:25-251158 INFO LOADER: "llama.cpp"
21:59:25-252024 INFO TRUNCATION LENGTH: 2048
21:59:25-252676 INFO INSTRUCTION TEMPLATE: "Alpaca"
21:59:25-253254 INFO Loaded the model in 0.86 seconds.
llama_print_timings: load time = 602.62 ms
llama_print_timings: sample time = 79.49 ms / 198 runs ( 0.40 ms per token, 2490.82 tokens per second)
llama_print_timings: prompt eval time = 599.52 ms / 154 tokens ( 3.89 ms per token, 256.87 tokens per second)
llama_print_timings: eval time = 7621.22 ms / 197 runs ( 38.69 ms per token, 25.85 tokens per second)
llama_print_timings: total time = 8725.49 ms / 351 tokens
Output generated in 9.07 seconds (21.73 tokens/s, 197 tokens, context 154, seed 1568613668)
Llama.generate: prefix-match hit
llama_print_timings: load time = 602.62 ms
llama_print_timings: sample time = 125.18 ms / 318 runs ( 0.39 ms per token, 2540.26 tokens per second)
llama_print_timings: prompt eval time = 275.28 ms / 32 tokens ( 8.60 ms per token, 116.25 tokens per second)
llama_print_timings: eval time = 17877.87 ms / 317 runs ( 56.40 ms per token, 17.73 tokens per second)
llama_print_timings: total time = 18998.58 ms / 349 tokens
Output generated in 19.34 seconds (16.39 tokens/s, 317 tokens, context 383, seed 1815112653)
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_transports/default.py:69 │
│ in map_httpcore_exceptions │
│ │
│ 68 try: │
│ ❱ 69 yield │
│ 70 except Exception as exc: │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_transports/default.py:233 │
│ in handle_request │
│ │
│ 232 with map_httpcore_exceptions(): │
│ ❱ 233 resp = self._pool.handle_request(req) │
│ 234 │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpcore/_sync/connection_pool.p │
│ y:216 in handle_request │
│ │
│ 215 self._close_connections(closing) │
│ ❱ 216 raise exc from None │
│ 217 │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpcore/_sync/connection_pool.p │
│ y:196 in handle_request │
│ │
│ 195 # Send the request on the assigned connection. │
│ ❱ 196 response = connection.handle_request( │
│ 197 pool_request.request │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpcore/_sync/http_proxy.py:207 │
│ in handle_request │
│ │
│ 206 ) │
│ ❱ 207 return self._connection.handle_request(proxy_request) │
│ 208 │
│ │
│ ... 1 frames hidden ... │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpcore/_sync/connection.py:76 │
│ in handle_request │
│ │
│ 75 if self._connection is None: │
│ ❱ 76 stream = self._connect(request) │
│ 77 │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpcore/_sync/connection.py:122 │
│ in _connect │
│ │
│ 121 with Trace("connect_tcp", logger, request, kwargs) │
│ ❱ 122 stream = self._network_backend.connect_tcp(**k │
│ 123 trace.return_value = stream │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpcore/_backends/sync.py:205 │
│ in connect_tcp │
│ │
│ 204 │
│ ❱ 205 with map_exceptions(exc_map): │
│ 206 sock = socket.create_connection( │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/contextlib.py:155 in __exit__ │
│ │
│ 154 try: │
│ ❱ 155 self.gen.throw(typ, value, traceback) │
│ 156 except StopIteration as exc: │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpcore/_exceptions.py:14 in │
│ map_exceptions │
│ │
│ 13 if isinstance(exc, from_exc): │
│ ❱ 14 raise to_exc(exc) from exc │
│ 15 raise # pragma: nocover │
╰──────────────────────────────────────────────────────────────────────────────╯
ConnectTimeout: [Errno 110] Connection timed out
The above exception was the direct cause of the following exception:
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/serve │
│ r.py:255 in <module> │
│ │
│ 254 # Launch the web UI │
│ ❱ 255 create_interface() │
│ 256 while True: │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/serve │
│ r.py:161 in create_interface │
│ │
│ 160 with OpenMonkeyPatch(): │
│ ❱ 161 shared.gradio['interface'].launch( │
│ 162 max_threads=64, │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/gradio/blocks.py:2106 in launch │
│ │
│ 2105 # Workaround by triggering the app endpoint │
│ ❱ 2106 httpx.get( │
│ 2107 f"{self.local_url}startup-events", verify=ssl_ver │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_api.py:198 in get │
│ │
│ 197 """ │
│ ❱ 198 return request( │
│ 199 "GET", │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_api.py:106 in request │
│ │
│ 105 ) as client: │
│ ❱ 106 return client.request( │
│ 107 method=method, │
│ │
│ ... 3 frames hidden ... │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_client.py:979 in │
│ _send_handling_redirects │
│ │
│ 978 │
│ ❱ 979 response = self._send_single_request(request) │
│ 980 try: │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_client.py:1015 in │
│ _send_single_request │
│ │
│ 1014 with request_context(request=request): │
│ ❱ 1015 response = transport.handle_request(request) │
│ 1016 │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_transports/default.py:232 │
│ in handle_request │
│ │
│ 231 ) │
│ ❱ 232 with map_httpcore_exceptions(): │
│ 233 resp = self._pool.handle_request(req) │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/contextlib.py:155 in __exit__ │
│ │
│ 154 try: │
│ ❱ 155 self.gen.throw(typ, value, traceback) │
│ 156 except StopIteration as exc: │
│ │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_transports/default.py:86 │
│ in map_httpcore_exceptions │
│ │
│ 85 message = str(exc) │
│ ❱ 86 raise mapped_exc(message) from exc │
│ 87 │
╰──────────────────────────────────────────────────────────────────────────────╯
ConnectTimeout: [Errno 110] Connection timed out
System Info
Ubuntu, Nvidia 1660 TI
Same issue, WIndows 11, Nvidia 4090
│ I:\text-generation-webui\installer_files\env\Lib\contextlib.py:158 in exit │ │ │ │ 157 try: │ │ ❱ 158 self.gen.throw(typ, value, traceback) │ │ 159 except StopIteration as exc: │ │ │ │ I:\text-generation-webui\installer_files\env\Lib\site-packages\httpx_transports\default.py:86 in │ │ map_httpcore_exceptions │ │ │ │ 85 message = str(exc) │ │ ❱ 86 raise mapped_exc(message) from exc │ │ 87 │ │ ReadTimeout: timed out ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Environment:
- Python version: [3.11.8]
- text-generation-webui version: [snapshot-2024-03-31]
- httpcore version [1.0.5]
- httpx version [0.27.0]
- OS: [Windows 11]
The issue did not exist before updating the text-generation-webui and running update_wizard_windows.bat. It was running fine previously
Same. Windows 11, RTX 3070, everything worked fine before the update. On latest as of 4/5.
Edit. Has something to do with memory. When I use a 7B model, it works just fine, but when I use a 8x7, it just disconnects. Note that I used those 8x7 models just fine prior to update.
Just started getting this error.
Ran update yesterday, no issue. Ran update just now, issue is now happening.
All is fine until I send or generate a message. Console says nothing about it, just to press a key to exit.
This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.