lorax
lorax copied to clipboard
The server is failing to run
System Info
I am using docker image and it is 2 days old ghcr.io/predibase/lorax:main This is my host nvidia info : +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA H100 80GB HBM3 Off | 00000000:55:00.0 Off | 0 | | N/A 37C P0 72W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 1 NVIDIA H100 80GB HBM3 Off | 00000000:68:00.0 Off | 0 | | N/A 37C P0 72W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 2 NVIDIA H100 80GB HBM3 Off | 00000000:D2:00.0 Off | 0 | | N/A 37C P0 70W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+ | 3 NVIDIA H100 80GB HBM3 Off | 00000000:E4:00.0 Off | 0 | | N/A 37C P0 72W / 700W | 0MiB / 81559MiB | 0% Default | | | | Disabled | +-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [ ] An officially supported command
- [ ] My own modifications
Reproduction
just run the docker with these args :
docker run --gpus '"device=0,1,2,3"'
-p 8800:80
--shm-size=150gb
-d
--name lorax
-v /home/dockerfiles/:/app
-v /home/dockerfiles/data-lorax:/data
ghcr.io/predibase/lorax:main
--model-id /app/Mixtral-8x7B-v0.1
--adapter-id /app/adapters/vprn_adapter
--adapter-source local
--master-port 29400
Expected behavior
2024-08-26T19:59:50.323362Z INFO lorax_launcher: Args { model_id: "/app/Mixtral-8x7B-v0.1", adapter_id: Some("/app/adapters/vprn_adapter"), source: "hub", default_adapter_source: None, adapter_source: "local", revision: None, validation_workers: 2, sharded: None, embedding_model: None, num_shard: None, quantize: None, compile: false, speculative_tokens: None, preloaded_adapter_ids: [], dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1024, max_total_tokens: 2048, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 4096, max_batch_total_tokens: None, max_waiting_tokens: 20, eager_prefill: None, prefix_caching: None, max_active_adapters: 1024, adapter_cycle_time_s: 2, adapter_memory_fraction: 0.1, hostname: "2203c0c58385", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29400, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], cors_allow_header: [], cors_expose_header: [], cors_allow_method: [], cors_allow_credentials: None, watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false, tokenizer_config_path: None } 2024-08-26T19:59:50.323401Z INFO lorax_launcher: Sharding model on 4 processes 2024-08-26T19:59:50.323517Z INFO download: lorax_launcher: Starting download process. 2024-08-26T19:59:57.513571Z INFO lorax_launcher: weights.py:474 Files are already present on the host. Skipping download.
2024-08-26T19:59:57.513621Z INFO lorax_launcher: weights.py:474 Files are already present on the host. Skipping download.
2024-08-26T19:59:58.333244Z INFO download: lorax_launcher: Successfully downloaded weights.
2024-08-26T19:59:58.333887Z INFO shard-manager: lorax_launcher: Starting shard rank=0
2024-08-26T19:59:58.333887Z INFO shard-manager: lorax_launcher: Starting shard rank=1
2024-08-26T19:59:58.333952Z INFO shard-manager: lorax_launcher: Starting shard rank=2
2024-08-26T19:59:58.334032Z INFO shard-manager: lorax_launcher: Starting shard rank=3
2024-08-26T20:00:08.347181Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=0
2024-08-26T20:00:08.347898Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=1
2024-08-26T20:00:08.348047Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=2
2024-08-26T20:00:08.349334Z INFO shard-manager: lorax_launcher: Waiting for shard to be ready... rank=3
2024-08-26T20:00:14.389465Z ERROR lorax_launcher: server.py:287 Error when initializing model
Traceback (most recent call last):
File "/opt/conda/bin/lorax-server", line 8, in
File "/opt/conda/lib/python3.10/site-packages/lorax_server/server.py", line 274, in serve_inner model = get_model( File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/init.py", line 221, in get_model return FlashMixtral( File "/opt/conda/lib/python3.10/site-packages/lorax_server/models/flash_mixtral.py", line 65, in init torch.distributed.barrier(group=self.process_group) File "/opt/conda/lib/python3.10/site-packages/torch/distributed/c10d_logger.py", line 79, in wrapper return func(*args, **kwargs) File "/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 3938, in barrier work = group.barrier(opts=opts) torch.distributed.DistBackendError: NCCL error in: /opt/conda/conda-bld/pytorch_1720538438429/work/torch/csrc/distributed/c10d/NCCLUtils.hpp:275, unhandled cuda error (run with NCCL_DEBUG=INFO for details), NCCL version 2.20.5 ncclUnhandledCudaError: Call to CUDA function failed. Last error: Cuda failure 3 'initialization error'