Aaron Pham

Results 459 comments of Aaron Pham
trafficstars

> It’s used in `model_executor.guided_decoding`. is it okay to import entrypoints there? hmm, thinking more about this, it is probably ok to add a `vllm.reasoning` (iirc we don't do any...

not sure what you mean. but wouldn't `api_key` into the client would work?

Probably you want do setup custom repo with bentovllm, then uses vLLM’s support for api key there.

hmm, sorry for the ping @whereistejas, but can you try to fix the merge conflict with the new layout change?

@jackyzha0 from #1119 seems like this is what you want?

https://github.com/aarnphm/aarnphm.github.io/blob/main/quartz/components/Reader.tsx here you can take a look at my reader component

not sure if this is a standard elsewhere, but we can follow k8s health API endpoint for this fwiw. (i also responded in the ticket)

https://kubernetes.io/docs/reference/using-api/health-checks/#individual-health-checks This is probably also related to production stack, but what I have in mind: - `/readyz` can be used to determine whether the engine is sleeping or not. -...

I don't have a strong opinion on this, but and I don't really know what the enforcement for these would often look like, so no preference at all :)