Jiaxin Shan issues

Results 271 issues of


                                            Jiaxin Shan

Any specific optimization did in kserve to support LLM inference?

Hi community, I am wondering any specific optimization did in kserve to support LLM applications? Is there a feature list?

kind/question

Any benefits to use gRPC over QUIC?

Hi @gfanton Thanks for open source this project. I am new to QUIC and investigating whether QUIC is beneficial to gRPC. Could you also share some insights on the benefits...

Autoscaling support in Ray-llm

Just curious does ray-llm fully leverage ray serve autoscaling (https://docs.ray.io/en/latest/serve/autoscaling-guide.html)? Seems ray serve only support `target_num_ongoing_requests_per_replica ` and `max_concurrent_queries `, As we know, LLM output varies and these are not...

Use more generic interface to mock node resources

Currently, it only passes cpu and memory resources into cadvisor.Interface which is not enough to mock real world cases. https://github.com/volcano-sh/kubesim/blob/f4bd53f0b81c06f72466d981c5aabf11e044b8d1/pkg/mock/kubelet/cadvisor/testing/cadvisor_fake.go#L59-L60 We should extend this to a more generic way to...

Scope Kubeflow components in given namespace

In my current company, there're few orgs/platforms like to leverage KFP. Besides multi-user KFP, I am also evaluating if it's possible to deploy KFP per namespace since users are ok...

kind/question

lifecycle/stale

[Multi User] Support separate metadata for each namespace

Part of #1223, since we close it, we need a separate issue to track this feature. Support separate metadata for each namespace help us only see related artifact/executations. Currently, MLMD...

kind/feature

upstream_issue

lifecycle/frozen

Jiaxin Shan

Any specific optimization did in kserve to support LLM inference?

Any benefits to use gRPC over QUIC?

Autoscaling support in Ray-llm

Use more generic interface to mock node resources

Scope Kubeflow components in given namespace

[Multi User] Support separate metadata for each namespace

Is it possible to fetch application metrics and expose in Dapr prometheus endpoint?

Support dynamically loading Lora adapter from HuggingFace

[Doc] Fix the lora adapter path in server startup script

[Feature]: Support loading lora adapters from HuggingFace in runtime