Stas Bekman comments

Results 664 comments of


                                            Stas Bekman

if no expert found in parameter that have expert in name the loop should continue

I haven't gotten to saving checkpoints yet, so I don't have the understanding of this code yet. It's interesting someone is using this old implementation! @LckyLke, we are working on...

DeepSeek-Coder-V2-Lite-Instruct Error!

seconding this Issue with autoawq-0.2.7.post3

deepspeed-v1 prep notes

@eternalNight, would it be better to discuss each of these sub-plans in a separate Issue and keep this one focused mainly for the agreed upon tentative plans - we can...

bug: huggingface-hub 1.0.0 release breaks existing evaluate releases

switch to `evaluate==0.4.6` and it should work fine.

bug: huggingface-hub 1.0.0 release breaks existing evaluate releases

fwiw, my PR got merged in HF Transformers.

[Performance]: speed regression 0.6.2 => 0.6.3?

Thank you, @njhill Indeed adding `--multi-step-stream-outputs=False` brings it back into the 0.6.2 ballpark - thank you! 1. How can the users keep track of the default knobs being turned on/off...

[Performance]: speed regression 0.6.2 => 0.6.3?

Hmm, I wonder if perhaps at the very least you could maintain a single page with a few recipes for the common use cases? So as you are creating new...

[Usage]: How to use Multi-instance in Vllm? (Model replication on multiple GPUs)

It works fine with the online mode - you just create multiple servers (even reusing the same gpus!), but indeed it doesn't work with the offline mode. Here is an...

[Usage]: How to use Multi-instance in Vllm? (Model replication on multiple GPUs)

The problem seems to be in some internal state that is not being isolated, even if I do: ``` llm1 = LLM( model="meta-llama/Meta-Llama-3-8B-Instruct", tensor_parallel_size=8, gpu_memory_utilization=0.65, ) del llm1 llm2 =...

[Usage]: How to use Multi-instance in Vllm? (Model replication on multiple GPUs)

Thanks a lot for working on that, @njhill - that will help with disagrregation type of offline use of vllm.