text-generation-webui Connection timed out

Describe the bug

The issues started after updating today. I load a model, get through a few generations, and then it crashes.

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Start webui, load a model, and the program always crashes within like two minutes.

Screenshot

No response

Logs

21:59:18-930341 INFO     Starting Text generation web UI                        
21:59:18-933081 INFO     Loading the extension "gallery"                        

Running on local URL:  http://127.0.0.1:7860

21:59:24-394228 INFO     Loading "zephyr-7b-beta.Q4_K_S.gguf"                   
21:59:24-466515 INFO     llama.cpp weights detected:                            
                         "models/zephyr-7b-beta.Q4_K_S.gguf"                    
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   yes
ggml_init_cublas: CUDA_USE_TENSOR_CORES: no
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce GTX 1660 Ti, compute capability 7.5, VMM: yes
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from models/zephyr-7b-beta.Q4_K_S.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = huggingfaceh4_zephyr-7b-beta
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 14
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32              = 2
llama_model_loader: - kv  20:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_K:  217 tensors
llama_model_loader: - type q5_K:    8 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_K - Small
llm_load_print_meta: model params     = 7.24 B
llm_load_print_meta: model size       = 3.86 GiB (4.57 BPW) 
llm_load_print_meta: general.name     = huggingfaceh4_zephyr-7b-beta
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 2 '</s>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.22 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:        CPU buffer size =    70.31 MiB
llm_load_tensors:      CUDA0 buffer size =  3877.55 MiB
..................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      CUDA0 KV buffer size =   256.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:  CUDA_Host input buffer size   =    13.02 MiB
llama_new_context_with_model:      CUDA0 compute buffer size =   164.00 MiB
llama_new_context_with_model:  CUDA_Host compute buffer size =     8.00 MiB
llama_new_context_with_model: graph splits (measure): 2
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | 
Model metadata: {'tokenizer.ggml.padding_token_id': '2', 'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '2', 'general.architecture': 'llama', 'llama.rope.freq_base': '10000.000000', 'llama.context_length': '32768', 'general.name': 'huggingfaceh4_zephyr-7b-beta', 'llama.embedding_length': '4096', 'llama.feed_forward_length': '14336', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '128', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '8', 'general.quantization_version': '2', 'tokenizer.ggml.model': 'llama', 'general.file_type': '14'}
Using fallback chat format: None
21:59:25-251158 INFO     LOADER: "llama.cpp"                                    
21:59:25-252024 INFO     TRUNCATION LENGTH: 2048                                
21:59:25-252676 INFO     INSTRUCTION TEMPLATE: "Alpaca"                         
21:59:25-253254 INFO     Loaded the model in 0.86 seconds.                      
                                                                                
llama_print_timings:        load time =     602.62 ms
llama_print_timings:      sample time =      79.49 ms /   198 runs   (    0.40 ms per token,  2490.82 tokens per second)
llama_print_timings: prompt eval time =     599.52 ms /   154 tokens (    3.89 ms per token,   256.87 tokens per second)
llama_print_timings:        eval time =    7621.22 ms /   197 runs   (   38.69 ms per token,    25.85 tokens per second)
llama_print_timings:       total time =    8725.49 ms /   351 tokens
Output generated in 9.07 seconds (21.73 tokens/s, 197 tokens, context 154, seed 1568613668)
Llama.generate: prefix-match hit
                                                                                
llama_print_timings:        load time =     602.62 ms
llama_print_timings:      sample time =     125.18 ms /   318 runs   (    0.39 ms per token,  2540.26 tokens per second)
llama_print_timings: prompt eval time =     275.28 ms /    32 tokens (    8.60 ms per token,   116.25 tokens per second)
llama_print_timings:        eval time =   17877.87 ms /   317 runs   (   56.40 ms per token,    17.73 tokens per second)
llama_print_timings:       total time =   18998.58 ms /   349 tokens
Output generated in 19.34 seconds (16.39 tokens/s, 317 tokens, context 383, seed 1815112653)
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_transports/default.py:69  │
│ in map_httpcore_exceptions                                                   │
│                                                                              │
│    68     try:                                                               │
│ ❱  69         yield                                                          │
│    70     except Exception as exc:                                           │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_transports/default.py:233 │
│  in handle_request                                                           │
│                                                                              │
│   232         with map_httpcore_exceptions():                                │
│ ❱ 233             resp = self._pool.handle_request(req)                      │
│   234                                                                        │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpcore/_sync/connection_pool.p │
│ y:216 in handle_request                                                      │
│                                                                              │
│   215             self._close_connections(closing)                           │
│ ❱ 216             raise exc from None                                        │
│   217                                                                        │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpcore/_sync/connection_pool.p │
│ y:196 in handle_request                                                      │
│                                                                              │
│   195                     # Send the request on the assigned connection.     │
│ ❱ 196                     response = connection.handle_request(              │
│   197                         pool_request.request                           │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpcore/_sync/http_proxy.py:207 │
│  in handle_request                                                           │
│                                                                              │
│   206         )                                                              │
│ ❱ 207         return self._connection.handle_request(proxy_request)          │
│   208                                                                        │
│                                                                              │
│                           ... 1 frames hidden ...                            │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpcore/_sync/connection.py:76  │
│ in handle_request                                                            │
│                                                                              │
│    75                 if self._connection is None:                           │
│ ❱  76                     stream = self._connect(request)                    │
│    77                                                                        │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpcore/_sync/connection.py:122 │
│  in _connect                                                                 │
│                                                                              │
│   121                     with Trace("connect_tcp", logger, request, kwargs) │
│ ❱ 122                         stream = self._network_backend.connect_tcp(**k │
│   123                         trace.return_value = stream                    │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpcore/_backends/sync.py:205   │
│ in connect_tcp                                                               │
│                                                                              │
│   204                                                                        │
│ ❱ 205         with map_exceptions(exc_map):                                  │
│   206             sock = socket.create_connection(                           │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/contextlib.py:155 in __exit__                  │
│                                                                              │
│   154             try:                                                       │
│ ❱ 155                 self.gen.throw(typ, value, traceback)                  │
│   156             except StopIteration as exc:                               │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpcore/_exceptions.py:14 in    │
│ map_exceptions                                                               │
│                                                                              │
│   13             if isinstance(exc, from_exc):                               │
│ ❱ 14                 raise to_exc(exc) from exc                              │
│   15         raise  # pragma: nocover                                        │
╰──────────────────────────────────────────────────────────────────────────────╯
ConnectTimeout: [Errno 110] Connection timed out

The above exception was the direct cause of the following exception:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/serve │
│ r.py:255 in <module>                                                         │
│                                                                              │
│   254         # Launch the web UI                                            │
│ ❱ 255         create_interface()                                             │
│   256         while True:                                                    │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/serve │
│ r.py:161 in create_interface                                                 │
│                                                                              │
│   160     with OpenMonkeyPatch():                                            │
│ ❱ 161         shared.gradio['interface'].launch(                             │
│   162             max_threads=64,                                            │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/gradio/blocks.py:2106 in launch  │
│                                                                              │
│   2105                 # Workaround by triggering the app endpoint           │
│ ❱ 2106                 httpx.get(                                            │
│   2107                     f"{self.local_url}startup-events", verify=ssl_ver │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_api.py:198 in get         │
│                                                                              │
│   197     """                                                                │
│ ❱ 198     return request(                                                    │
│   199         "GET",                                                         │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_api.py:106 in request     │
│                                                                              │
│   105     ) as client:                                                       │
│ ❱ 106         return client.request(                                         │
│   107             method=method,                                             │
│                                                                              │
│                           ... 3 frames hidden ...                            │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_client.py:979 in          │
│ _send_handling_redirects                                                     │
│                                                                              │
│    978                                                                       │
│ ❱  979             response = self._send_single_request(request)             │
│    980             try:                                                      │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_client.py:1015 in         │
│ _send_single_request                                                         │
│                                                                              │
│   1014         with request_context(request=request):                        │
│ ❱ 1015             response = transport.handle_request(request)              │
│   1016                                                                       │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_transports/default.py:232 │
│  in handle_request                                                           │
│                                                                              │
│   231         )                                                              │
│ ❱ 232         with map_httpcore_exceptions():                                │
│   233             resp = self._pool.handle_request(req)                      │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/contextlib.py:155 in __exit__                  │
│                                                                              │
│   154             try:                                                       │
│ ❱ 155                 self.gen.throw(typ, value, traceback)                  │
│   156             except StopIteration as exc:                               │
│                                                                              │
│ /media/mike/70c75d3e-9832-4991-b0e6-414481a4b934/text-generation-webui/insta │
│ ller_files/env/lib/python3.11/site-packages/httpx/_transports/default.py:86  │
│ in map_httpcore_exceptions                                                   │
│                                                                              │
│    85         message = str(exc)                                             │
│ ❱  86         raise mapped_exc(message) from exc                             │
│    87                                                                        │
╰──────────────────────────────────────────────────────────────────────────────╯
ConnectTimeout: [Errno 110] Connection timed out

System Info

Ubuntu, Nvidia 1660 TI

Mar 31 '24 03:03 devPhases

Same issue, WIndows 11, Nvidia 4090

│ I:\text-generation-webui\installer_files\env\Lib\contextlib.py:158 in exit │ │ │ │ 157 try: │ │ ❱ 158 self.gen.throw(typ, value, traceback) │ │ 159 except StopIteration as exc: │ │ │ │ I:\text-generation-webui\installer_files\env\Lib\site-packages\httpx_transports\default.py:86 in │ │ map_httpcore_exceptions │ │ │ │ 85 message = str(exc) │ │ ❱ 86 raise mapped_exc(message) from exc │ │ 87 │ │ ReadTimeout: timed out ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Mar 31 '24 15:03 AndyX-Net

Environment:

Python version: [3.11.8]
text-generation-webui version: [snapshot-2024-03-31]
httpcore version [1.0.5]
httpx version [0.27.0]
OS: [Windows 11]

The issue did not exist before updating the text-generation-webui and running update_wizard_windows.bat. It was running fine previously

Apr 01 '24 13:04 AndyX-Net

Same. Windows 11, RTX 3070, everything worked fine before the update. On latest as of 4/5.

Edit. Has something to do with memory. When I use a 7B model, it works just fine, but when I use a 8x7, it just disconnects. Note that I used those 8x7 models just fine prior to update.

Apr 06 '24 04:04 evanheckert

Just started getting this error.

Ran update yesterday, no issue. Ran update just now, issue is now happening.

All is fine until I send or generate a message. Console says nothing about it, just to press a key to exit.

Apr 07 '24 03:04 9wow

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Jun 06 '24 23:06 github-actions[bot]

text-generation-webui text-generation-webui copied to clipboard

Connection timed out

Describe the bug

Is there an existing issue for this?

Reproduction

Screenshot

Logs

System Info

text-generation-webui
text-generation-webui copied to clipboard