eByteTheDust

Results 3 comments of eByteTheDust

I get the same first line of the error above: _"Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution"_. If I set the following environmental variable VLLM loads...

> > @youkaichao - Is this change now available in version 0.6.2? I have a requirement to load LLaMA 3.2 90B vision model across four GPUs spread across two nodes...

I was using vllm [v0.5.0.post1] and guided generation was working great. I upgraded to vllm [v0.6.2] and the only response I get is { " sometimes when adding truncate_prompt_tokens=30, I...