Nick Hill comments

Results 287 comments of


                                            Nick Hill

Fix position embeddings for GPT-J and CodeGen

Oops, I guess we should use `torch.cat()` instead

GPT Neox rotary embedding does not work with padding left

Hi @OlivierDehaene, I'm actually in the middle of porting the fix from #22069 to GPT-Neox too, since I was also interested in that one (in parallel with other things including...

[BugFix][Frontend] Use correct, shared tokenizer in OpenAI server

Test failures look unrelated (network blips).

[Bugfix] Fix the bug when the stop word is set to eos

@rucnyz @simon-mo I'm not sure that this is the correct fix. When `params.include_stop_str_in_output` is False and `params.skip_special_tokens` is False, then you _do_ want to truncate the eos token. When `params.skip_special_tokens`...

[Bugfix] Fix the bug when the stop word is set to eos

@rucnyz this should be addressed by https://github.com/vllm-project/vllm/pull/3672

[Bugfix] Fix the bug when the stop word is set to eos

@rucnyz closing this now since the issue should be resolved by #3672. Please feel free to reopen another PR if you still don't see expected behaviour. Thanks for the contribution.

[Doc]: Offline Inference Distributed Broken for TP

I'm not sure whether this would be of any help but you can now also use TP without Ray workers for the LLM itself, by passing `distributed_executor_backend="mp"` when creating the...

Add more Prometheus metrics

Huge thanks for all the work on this and reviews @ronensc @robertgshaw2-neuralmagic @hmellor

[Core] Add retention policy code for processing requests

@James4Ever0 could you try your case again now that fix #4363 has been merged?

[Core][Distributed] add fast broadcast for tensor dict

@youkaichao it would be good to check whether there's non-negligible performance difference in end-to-end tests before introducing the additional complexity, it's not always easy to infer this from a microbenchmark....