Siddharth Venkatesan comments

Results 24 comments of


                                            Siddharth Venkatesan

trafficstars

Endpoint Timeout after 60 Seconds for Longer Generation

You can set it as an environment variable. I will need to update our docs to reflect this configuration.

AWQ with Marlin kernel erroring out while loading the model in DJL 0.29 with vllm

I am able to reproduce this issue with DJL 0.29.0 (vllm 0.5.3.post1) and DJL 0.30.0 (vllm 0.6.2). I am also able to reproduce this issue with vllm directly, as you...

AWQ with Marlin kernel erroring out while loading the model in DJL 0.29 with vllm

It does seem like vLLM supports converting a regular AWQ model to marlin format within vllm, but doesn't support a marlin format being directly supplied to vllm. See https://github.com/vllm-project/vllm/issues/7517. Unfortunately...

Expose vLLM logprobs in model output

What is the payload you are using to invoke the endpoint? We do expose generation parameters that can be included in the inference request. Details are in https://docs.djl.ai/master/docs/serving/serving/docs/lmi/user_guides/lmi_input_output_schema.html. We have...