Jiaxin Shan

Results 742 comments of Jiaxin Shan

Do you mean generation/embedding/tokenization apis supported in vLLM (https://github.com/vllm-project/vllm/tree/main/vllm/entrypoints/openai)? The current gateway design is more like a proxy instead of an additional API layer. Technically, it supports any protocol engine...

Got your point, that totally makes sense. I think it should support something similar to Kubernetes Extension API services. The Batch API is a good example—currently, there doesn't seem to...

@gaocegege I see. Technically I think it's possible. P&D case requires such router as well. At the same time, AIBrix has a batch RFC https://github.com/vllm-project/aibrix/issues/182 as well but due to...

this task should be part of https://github.com/vllm-project/aibrix/issues/846. As v0.3.0 release approaches, we should finish this task asap

@OrdinaryCrazy any updates on the api compatibility and results comparison?

Great work! I think Varun may make up some incompatible cases later and we still need to cut a separate PR in the documentation later before v0.3.0 release. I will...

1. there're some basic assumptions on the usage. Basically, lora will serve "high density" use case, I won't expect lora to be scheduled across multiple instances for most of the...

dependencies ``` The CustomResourceDefinition "envoyproxies.gateway.envoyproxy.io" is invalid: metadata.annotations: Too long: must have at most 262144 bytes ``` config ``` Error from server (Invalid): error when creating "config/default": CustomResourceDefinition.apiextensions.k8s.io "rayjobs.ray.io" is...

- Option 1: Update `crd:maxDescLen=0` - Option 2: use `--server-side`