Cao Yu comments

Results 7 comments of


                                            Cao Yu

The response issue between 0.5 and 7b

I've met the same response issue in 7b. But after I changed the `torch` version to 2.1.2 and `transformers` version to 4.40.0. The response shows correctly. ``` Loaded LLaVA model:...

The response issue between 0.5 and 7b

Maybe u could use the same [requirements.txt](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/requirements.txt) in the main branch.

The response issue between 0.5 and 7b

Just did a quick run, should be ok with si as well. ``` Loaded LLaVA model: lmms-lab/llava-onevision-qwen2-7b-si Loading vision tower: google/siglip-so400m-patch14-384 Model Class: LlavaQwenForCausalLM ["This image is a radar chart...

[Bug] DeepSeek R1 serve crash occasionally on 2*H100

@zui-jiang I met the same issue, from your logs, it seems before the 300s watchdog timeout, the longest input context would be around 13k. I've increased the watchdog timeout to...

[Bug] DeepSeek R1 serve crash occasionally on 2*H100

#3280 would be the same issue

[Bug] DeepSeek R1 serve crash occasionally on 2*H100

--disable-cuda-graph works, but super slow, like halve the speed, from 40+ tokens/s to 10~20 tokens/s. also if I adjust the --watchdog-timeout 36000, it seems nccl timeout still at 600s.

Problem with multi-GPU diffusers Flux Schnell

I'm also using 2x3090. I tried something like this, it will offload some parameters to CPU. > Some parameters are on the meta device because they were offloaded to the...