Jiaxin Shan issues

Results 271 issues of


                                            Jiaxin Shan

[Bug]: relative path doesn't work for Lora adapter model

### Your current environment Environment Details ```text PyTorch version: 2.3.0+cu121 Is debug build: False CUDA used to build PyTorch: 12.1 ROCM used to build PyTorch: N/A OS: Ubuntu 22.04.3 LTS...

bug

[Doc]: Failed to download lora adapter using the path from documentation

### 📚 The doc issue https://docs.vllm.ai/en/latest/models/lora.html describe the steps to load a lora model. ``` python -m vllm.entrypoints.openai.api_server \ --model meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ ``` There're two issues...

documentation

[RFC]: Enhancing LoRA Management for Production Environments in vLLM

This RFC proposes improvements to the management of Low-Rank Adaptation (LoRA) in vLLM to make it more suitable for production environments. This proposal aims to address several pain points observed...

RFC

[Feature]: Expose Lora lineage information from /v1/models

### 🚀 The feature, motivation and pitch ``` python -m vllm.entrypoints.openai.api_server \ --model /workspace/meta-llama/Llama-2-7b-hf \ --enable-lora \ --lora-modules sql-lora=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/ ``` The `/v1/models` response from above setup can not expose the...

feature request

[Core] Support Lora lineage and base model metadata management

Address https://github.com/vllm-project/vllm/issues/6275 https://github.com/vllm-project/vllm/issues/6274 1. Leverage `root` and `parent` field in the ModelCard to demonstrate the lora adapter lineage. Lora command supports json format and user can specify the base model...

WIP: Blog post for new static policy

This is just a place holder PR for now. It's a blog post for https://github.com/kubernetes/enhancements/issues/4176 /cc @rashansmith

cncf-cla: yes

size/S

do-not-merge/work-in-progress

do-not-merge/hold

language/en

area/blog

[Bug]: inter-token latency is lower than TPOT in serving benchmark result

### Your current environment v0.5.2. vLLM env is not an issue so I will just skip the collection process ### 🐛 Describe the bug I am running benchmark tests and...

bug

KubeAPIWarningLogger unknown field "xxxx.creationTimestamp"

![image](https://github.com/user-attachments/assets/d523047c-156c-427a-807d-f160d87ee07a) When I create an object, I notice it surface some `KubeAPIWarningLogger` messages, seems it doesn't affect anything but it's annoying. This should be something common to other cases, anyone...

lifecycle/stale

[EKS] [request]: Provide Nvidia Driver installer on general AMI to replace GPU AMI

**Tell us about your request** What do you want us to build? A Nvidia Driver installer for EKS-Optimized Linux AMI. **Which service(s) is this request for?** EKS **Tell us about...

EKS

Proposed

[RFC] Introduce new API-RayCluster Fleet and ReplicaSet in KubeRay

### Search before asking - [X] I had searched in the [issues](https://github.com/ray-project/kuberay/issues) and found no similar feature requirement. /cc Bytedancer @Basasuya @Yicheng-Lu-llll ### Description The recent release of the Llama...

enhancement

1.3.0