Pierce Freeman comments

Results 22 comments of


                                            Pierce Freeman

[Bug]: vllm stall on llama3-70b warmup with 0.4.1

@youkaichao Attached the truncated tail. It stays this way indefinitely with no additional calls written. Seems like it's either legitimately stalling out there or trying to retry the download... ```...

[Bug]: vllm stall on llama3-70b warmup with 0.4.1

@youkaichao Huggingface by itself seems like it's working fine. It loads the full model in about 3.5mins. ``` from transformers import AutoModel, AutoTokenizer print("Loading initial model") model = AutoModel.from_pretrained(MODEL_DIR) tokenizer...

[Bug]: vllm stall on llama3-70b warmup with 0.4.1

@youkaichao That seems fine too: ```python print("Trying to load hub...") from huggingface_hub import HfFileSystem, snapshot_download print("Did load hub...") ``` ``` Trying to load hub... Did load hub... ```

[Bug]: vllm stall on llama3-70b warmup with 0.4.1

@youkaichao Sure thing, here's the environment with the additional logging. Based on this it looks like the stall is happening in a different location other than the HF hub: ```...

[Bug]: vllm stall on llama3-70b warmup with 0.4.1

@youkaichao Here's the full stack trace: ``` PART 0 up_helper in /usr/local/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:1429 2024-04-23 05:48:36.124556 Return from pg_to_tag in /usr/local/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:534 to _new_process_group_helper in /usr/local/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:1429 2024-04-23 05:48:36.124669 Return from _new_process_group_helper in /usr/local/lib/python3.11/site-packages/torch/distributed/distributed_c10d.py:1430...

[Bug]: vllm stall on llama3-70b warmup with 0.4.1

I suspected there was something funky going on with NCCL. The image / hardware configuration is relatively conventional, though, so I wonder if there's something amiss at the host OS...

[Bug]: vllm stall on llama3-70b warmup with 0.4.1

@majestichou At least in my case, the issue was prompted by cross-GPU coordination (NCCL in particular) on an inference box. Doesn't thus far seem to be architecture related so might...

[BUG] page.on('request') is not capturing favicon.ico URI

If you're already using Chromium, this is pretty easy to do over CDP. You'll just need to use the [new](https://developer.chrome.com/articles/new-headless/) headless mode or a headful spawn, since the old headless...

How can I use the Lora Adapter for a model with Vocab size 40960?

@Yard1 I've tried llama3 on v0.4.0.post1, but this issue is still present when initializing the engine with lora adapters. The latest `main` code seems to get further in the initialization...

How can I use the Lora Adapter for a model with Vocab size 40960?

@Techinix You can either manually install the wheel from the [Release page](https://github.com/vllm-project/vllm/releases/tag/v0.4.1) or build yourself locally from the [tagged git version](https://github.com/vllm-project/vllm/tree/v0.4.1). On my setup, local building takes around 15-30min.