text-generation-inference
text-generation-inference copied to clipboard
TGI crashes while loading Qwen2-VL-7B-Instruct
System Info
2024-11-06T04:38:58.950145Z INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.80.1
Commit sha: b1f9044d6cf082423a517cf9a6aa6e5ebd34e1c2
Docker label: sha-b1f9044
nvidia-smi:
Wed Nov 6 04:38:58 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 561.09 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:01:00.0 On | Off |
| 0% 36C P5 33W / 450W | 675MiB / 24564MiB | 32% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
xpu-smi:
N/A
2024-11-06T04:38:58.950634Z INFO text_generation_launcher: Args {
model_id: "/data/Qwen/Qwen2-VL-7B-Instruct",
revision: None,
validation_workers: 2,
sharded: Some(
false,
),
num_shard: None,
quantize: Some(
Eetq,
),
speculate: None,
dtype: None,
kv_cache_dtype: None,
trust_remote_code: false,
max_concurrent_requests: 5,
max_best_of: 2,
max_stop_sequences: 4,
max_top_n_tokens: 5,
max_input_tokens: Some(
4999,
),
max_input_length: None,
max_total_tokens: Some(
5000,
),
waiting_served_ratio: 0.3,
max_batch_prefill_tokens: Some(
5050,
),
max_batch_total_tokens: None,
max_waiting_tokens: 20,
max_batch_size: Some(
32,
),
cuda_graphs: None,
hostname: "36c9ccfbcab9",
port: 5025,
shard_uds_path: "/tmp/text-generation-server",
master_addr: "localhost",
master_port: 29500,
huggingface_hub_cache: None,
weights_cache_override: None,
disable_custom_kernels: false,
cuda_memory_fraction: 1.0,
rope_scaling: None,
rope_factor: None,
json_output: false,
otlp_endpoint: None,
otlp_service_name: "text-generation-inference.router",
cors_allow_origin: [],
api_key: None,
watermark_gamma: None,
watermark_delta: None,
ngrok: false,
ngrok_authtoken: None,
ngrok_edge: None,
tokenizer_config_path: None,
disable_grammar_support: false,
env: true,
max_client_batch_size: 4,
lora_adapters: None,
usage_stats: On,
}
2024-11-06T04:39:00.844343Z INFO text_generation_launcher: Disabling prefix caching because of VLM model
2024-11-06T04:39:00.844371Z INFO text_generation_launcher: Using attention flashinfer - Prefix caching 0
2024-11-06T04:39:00.844383Z INFO text_generation_launcher: Using default cuda graphs [1, 2, 4, 8, 16, 32]
2024-11-06T04:39:00.844506Z INFO download: text_generation_launcher: Starting check and download process for /data/Qwen/Qwen2-VL-7B-Instruct
2024-11-06T04:39:03.993179Z INFO text_generation_launcher: Files are already present on the host. Skipping download.
2024-11-06T04:39:04.653005Z INFO download: text_generation_launcher: Successfully downloaded weights for /data/Qwen/Qwen2-VL-7B-Instruct
2024-11-06T04:39:04.653165Z INFO shard-manager: text_generation_launcher: Starting shard rank=0
2024-11-06T04:39:06.991476Z INFO text_generation_launcher: Using prefix caching = False
2024-11-06T04:39:06.991506Z INFO text_generation_launcher: Using Attention = flashinfer
WARNING 11-06 04:39:07 ray_utils.py:46] Failed to import Ray with ModuleNotFoundError("No module named 'ray'"). For distributed inference, please install Ray with `pip install ray`.
2024-11-06T04:39:14.664235Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:39:24.673931Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:39:34.683198Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:39:44.690843Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:39:54.700389Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:40:04.710560Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:40:14.719337Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:40:24.729205Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:40:34.739375Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:40:44.747625Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:40:54.757904Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:41:04.768215Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:41:14.776315Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:41:24.786533Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:41:34.796818Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:41:44.805857Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:41:54.815583Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:42:04.825250Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:42:14.833324Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:42:24.843081Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:42:34.853696Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:42:44.861886Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:42:54.871177Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:43:04.880903Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:43:14.889243Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:43:24.899113Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:43:34.908406Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:43:44.916331Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:43:54.926150Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:44:04.936289Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:44:14.945281Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:44:24.956096Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:44:34.965929Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:44:44.974155Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:44:54.984230Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:45:04.994178Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:45:15.002185Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:45:25.011549Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:45:35.021098Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:45:45.029212Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:45:55.039089Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:46:05.048297Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:46:15.055991Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:46:25.065346Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:46:35.075354Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:46:45.084651Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:46:55.094354Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:47:05.104200Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:47:15.111799Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:47:25.121002Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:47:35.130693Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:47:45.139452Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:47:55.148870Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:48:05.158577Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:48:15.166294Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:48:25.175607Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:48:35.184943Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:48:45.193299Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:48:55.203235Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:49:05.213497Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:49:15.222554Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:49:25.231924Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:49:35.241455Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:49:45.248904Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:49:55.258542Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:50:05.269004Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:50:15.277644Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2024-11-06T04:50:17.876053Z INFO text_generation_launcher: Using experimental prefill chunking = False
2024-11-06T04:50:18.714419Z INFO text_generation_launcher: Server started at unix:///tmp/text-generation-server-0
2024-11-06T04:50:18.780985Z INFO shard-manager: text_generation_launcher: Shard ready in 674.155920438s rank=0
2024-11-06T04:50:18.817087Z INFO text_generation_launcher: Starting Webserver
2024-11-06T04:50:18.868288Z DEBUG h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/client.rs:1274: binding client connection
2024-11-06T04:50:18.868373Z DEBUG h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/client.rs:1279: client connection bound
2024-11-06T04:50:18.868775Z DEBUG h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384 }
2024-11-06T04:50:18.871893Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Settings { flags: (0x0), initial_window_size: 4194304, max_frame_size: 4194304, max_header_list_size: 16384 }
2024-11-06T04:50:18.872201Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Settings { flags: (0x1: ACK) }
2024-11-06T04:50:18.872216Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 4128769 }
2024-11-06T04:50:18.872248Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5177345 }
2024-11-06T04:50:18.872437Z DEBUG service_discovery: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-11-06T04:50:18.872682Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Settings { flags: (0x1: ACK) }
2024-11-06T04:50:18.872699Z DEBUG Connection{peer=Client}: h2::proto::settings: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/proto/settings.rs:52: received settings ACK; applying Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384 }
2024-11-06T04:50:18.873337Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) }
2024-11-06T04:50:18.874009Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(1) }
2024-11-06T04:50:18.874043Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) }
2024-11-06T04:50:18.874603Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [70, 216, 154, 41, 248, 232, 176, 242] }
2024-11-06T04:50:18.874981Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [70, 216, 154, 41, 248, 232, 176, 242] }
2024-11-06T04:50:18.875679Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) }
2024-11-06T04:50:18.875986Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Data { stream_id: StreamId(1) }
2024-11-06T04:50:18.876014Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(1), flags: (0x5: END_HEADERS | END_STREAM) }
2024-11-06T04:50:18.876020Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5 }
2024-11-06T04:50:18.876925Z DEBUG h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/client.rs:1274: binding client connection
2024-11-06T04:50:18.876959Z DEBUG h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/client.rs:1279: client connection bound
2024-11-06T04:50:18.876974Z DEBUG h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384 }
2024-11-06T04:50:18.877061Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5177345 }
2024-11-06T04:50:18.877174Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Settings { flags: (0x0), initial_window_size: 4194304, max_frame_size: 4194304, max_header_list_size: 16384 }
2024-11-06T04:50:18.877204Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Settings { flags: (0x1: ACK) }
2024-11-06T04:50:18.877214Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 4128769 }
2024-11-06T04:50:18.877243Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Settings { flags: (0x1: ACK) }
2024-11-06T04:50:18.877272Z DEBUG Connection{peer=Client}: h2::proto::settings: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/proto/settings.rs:52: received settings ACK; applying Settings { flags: (0x0), enable_push: 0, initial_window_size: 2097152, max_frame_size: 16384 }
2024-11-06T04:50:18.877305Z DEBUG clear_cache{batch_id=None}:clear_cache{batch_id=None}: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-11-06T04:50:18.877708Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) }
2024-11-06T04:50:18.877739Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(1) }
2024-11-06T04:50:18.877746Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(1), flags: (0x1: END_STREAM) }
2024-11-06T04:50:18.877963Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [73, 124, 42, 86, 67, 10, 4, 118] }
2024-11-06T04:50:18.877997Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [73, 124, 42, 86, 67, 10, 4, 118] }
2024-11-06T04:50:18.878161Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=GoAway { error_code: NO_ERROR, last_stream_id: StreamId(0) }
2024-11-06T04:50:18.878197Z DEBUG Connection{peer=Client}: h2::proto::connection: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/proto/connection.rs:432: Connection::poll; connection error error=GoAway(b"", NO_ERROR, Library)
2024-11-06T04:50:18.879979Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(1), flags: (0x4: END_HEADERS) }
2024-11-06T04:50:18.880006Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Data { stream_id: StreamId(1) }
2024-11-06T04:50:18.880013Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(1), flags: (0x5: END_HEADERS | END_STREAM) }
2024-11-06T04:50:18.880016Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5 }
2024-11-06T04:50:18.880154Z DEBUG info:info: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-11-06T04:50:18.880252Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(3), flags: (0x4: END_HEADERS) }
2024-11-06T04:50:18.880287Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(3) }
2024-11-06T04:50:18.880294Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(3), flags: (0x1: END_STREAM) }
2024-11-06T04:50:18.880780Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(3), flags: (0x4: END_HEADERS) }
2024-11-06T04:50:18.880812Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Data { stream_id: StreamId(3) }
2024-11-06T04:50:18.880821Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Headers { stream_id: StreamId(3), flags: (0x5: END_HEADERS | END_STREAM) }
2024-11-06T04:50:18.880826Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=WindowUpdate { stream_id: StreamId(0), size_increment: 5 }
2024-11-06T04:50:18.880921Z INFO text_generation_router_v3: backends/v3/src/lib.rs:125: Warming up model
2024-11-06T04:50:18.881864Z DEBUG warmup{max_input_length=Some(4999) max_prefill_tokens=5050 max_total_tokens=Some(5000) max_batch_size=Some(32)}:warmup: tower::buffer::worker: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tower-0.4.13/src/buffer/worker.rs:197: service.ready=true processing request
2024-11-06T04:50:18.881984Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Headers { stream_id: StreamId(5), flags: (0x4: END_HEADERS) }
2024-11-06T04:50:18.882004Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(5) }
2024-11-06T04:50:18.882035Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Data { stream_id: StreamId(5), flags: (0x1: END_STREAM) }
2024-11-06T04:50:18.914255Z INFO text_generation_launcher: Using optimized Triton indexing kernels.
2024-11-06T04:50:18.979524Z DEBUG Connection{peer=Client}: h2::codec::framed_read: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_read.rs:405: received frame=Ping { ack: false, payload: [227, 222, 58, 95, 83, 190, 149, 210] }
2024-11-06T04:50:18.979565Z DEBUG Connection{peer=Client}: h2::codec::framed_write: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/h2-0.3.26/src/codec/framed_write.rs:213: send frame=Ping { ack: true, payload: [227, 222, 58, 95, 83, 190, 149, 210] }
2024-11-06T04:50:20.793022Z DEBUG hyper::client::service: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.31/src/client/service.rs:79: connection error: hyper::Error(Io, Custom { kind: BrokenPipe, error: "connection closed because of a broken pipe" })
2024-11-06T04:50:20.793047Z DEBUG hyper::proto::h2::client: /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/hyper-0.14.31/src/proto/h2/client.rs:326: client response error: stream closed because of a broken pipe
2024-11-06T04:50:20.793097Z ERROR warmup{max_input_length=Some(4999) max_prefill_tokens=5050 max_total_tokens=Some(5000) max_batch_size=Some(32)}:warmup: text_generation_router_v3::client: backends/v3/src/client/mod.rs:45: Server error: transport error
Error: Backend(Warmup(Generation("transport error")))
2024-11-06T04:50:20.843273Z ERROR text_generation_launcher: Webserver Crashed
2024-11-06T04:50:20.843305Z INFO text_generation_launcher: Shutting down shards
2024-11-06T04:50:20.883122Z INFO shard-manager: text_generation_launcher: Terminating shard rank=0
2024-11-06T04:50:20.883163Z INFO shard-manager: text_generation_launcher: Waiting for shard to gracefully shutdown rank=0
2024-11-06T04:50:20.983296Z INFO shard-manager: text_generation_launcher: shard terminated rank=0
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
I am using docker compose for my setup, but the main args are:
image: ghcr.io/huggingface/text-generation-inference
container_name: llm-server
command:
- --model-id /data/Qwen/Qwen2-VL-7B-Instruct
- --max-batch-prefill-tokens=5050
- --max-total-tokens=5000
- --max-input-tokens=4999
- --validation-workers=2
- --max-concurrent-requests=5
- --max-batch-size=32
- --port=5025
- --env
- --sharded=false
As for the text-generation-inference docker image, I am using the latest from yesterday (11/5/2024).
Expected behavior
Model should load fine