Aaron Pham

Results 579 comments of Aaron Pham

> Eventually I hope to switch from yapf to black (run with ruff), which has a line length of 88. Generally increasing line length is a one way move so...

But at a certain point where we touch model_executor or v1, it is bound to disrupt everyone PR right? fwiw I think we can add the config and run this...

I'm ok with manually adding noqa E501, just that it will contain a lot of these

cc @gaocegege might be interested

> I was testing the modifications using `vllm serve` and had a quick question. This should now be addressed. There was a bad fix somewhere then

We need to use the tokenizer for the reasoning parser. But if u like i can refactor out the tokenizer change first then rebase this change on top of that...

seems like the failure on entrypoint is not related 😿

fyi https://github.com/vllm-project/vllm/pull/18359 but structured outputs in CPU has limited suppport.

vllm doesn't have support with mlx, so not sure it is even being used there.