Nick Hill
Nick Hill
Oops, I guess we should use `torch.cat()` instead
Hi @OlivierDehaene, I'm actually in the middle of porting the fix from #22069 to GPT-Neox too, since I was also interested in that one (in parallel with other things including...
Test failures look unrelated (network blips).
@rucnyz @simon-mo I'm not sure that this is the correct fix. When `params.include_stop_str_in_output` is False and `params.skip_special_tokens` is False, then you _do_ want to truncate the eos token. When `params.skip_special_tokens`...
@rucnyz this should be addressed by https://github.com/vllm-project/vllm/pull/3672
@rucnyz closing this now since the issue should be resolved by #3672. Please feel free to reopen another PR if you still don't see expected behaviour. Thanks for the contribution.
I'm not sure whether this would be of any help but you can now also use TP without Ray workers for the LLM itself, by passing `distributed_executor_backend="mp"` when creating the...
Huge thanks for all the work on this and reviews @ronensc @robertgshaw2-neuralmagic @hmellor
@James4Ever0 could you try your case again now that fix #4363 has been merged?
@youkaichao it would be good to check whether there's non-negligible performance difference in end-to-end tests before introducing the additional complexity, it's not always easy to infer this from a microbenchmark....