worker-vllm
worker-vllm copied to clipboard
Fix rope_scaling dict parsing
Fixes https://github.com/runpod-workers/worker-vllm/issues/192 by correctly parsing the expected (https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py#L354C5-L354C17) json to a dict instead of just passing the string
Tested by deploying to Runpod with Menlo/Jan-nano-128k and ROPE_SCALING={"rope_type":"yarn","factor":3.2,"original_max_position_embeddings":40960}
Still waiting for the build to test if it works without specifying the env var
@Code42Cate was your test without the env var also succesful?
The docker images are updated on this?
Is this now working?