leiwen83 comments

Results 39 comments of


                                            leiwen83

add spec infer related into prometheus metrics.

Hi @robertgshaw2-neuralmagic @cadedaniel , How is going with the spec related metric, have we got the conclusion for how to make it happen? ;) The metric is critical to us...

Add control panel allow manage multi vllm instances

@simon-mo @robertgshaw2-neuralmagic RFC https://github.com/vllm-project/vllm/issues/4873 is created

Add control panel allow manage multi vllm instances

> Scope or not, there's no point to porting over FastChat's Python controller implementation, it's literally 1000x slower at scale than 1 day's worth of Rust code. Yep, rust also...

Is this project still actively being maintained?

> I'm using fastchat previously, and now plan to use vllm and Ray serve for LLM inference, seems it's also working well. So ray-llm is not my dependent project now...

Is this project still actively being maintained?

fastapi change may not be enough... For fastchat, it implement controller which track status of all workers, which make registry possible.

[BUG] try to finetune a llama 33b on 8*A100 40G, 600G RAM. But always OOM on RAM.

@Dominic789654 you may try my latest PR https://github.com/microsoft/DeepSpeed/pull/3629 This patch would allow loading checkpoint in serial way, so that it would not lead to memory peak for resume from the...

3090 tensor core performance

I see. current sample code only shows the usage for matmul. How about conv, is there any reference code?

关于训练开销

目前用8卡3090 instruction tuning，在256G上内存会直接爆掉。 accelerate launch --config_file configs/default_config.yaml instruction_tuning.py 这个可以限制gpu数量吗？

关于训练开销

请问这些数据是训练的吗？如果全量finetune的时候，是不是也是要差不多？如果采用lora finetune的话，相应的数据有吗？

关于训练开销

batchsize 对于资源需求有比较大影响吗？比方b=1的时候和b=2的时候，相应的资源是线性增长的吗？