leiwen83

Results 39 comments of leiwen83

Hi @robertgshaw2-neuralmagic @cadedaniel , How is going with the spec related metric, have we got the conclusion for how to make it happen? ;) The metric is critical to us...

@simon-mo @robertgshaw2-neuralmagic RFC https://github.com/vllm-project/vllm/issues/4873 is created

> Scope or not, there's no point to porting over FastChat's Python controller implementation, it's literally 1000x slower at scale than 1 day's worth of Rust code. Yep, rust also...

> I'm using fastchat previously, and now plan to use vllm and Ray serve for LLM inference, seems it's also working well. So ray-llm is not my dependent project now...

fastapi change may not be enough... For fastchat, it implement controller which track status of all workers, which make registry possible.

@Dominic789654 you may try my latest PR https://github.com/microsoft/DeepSpeed/pull/3629 This patch would allow loading checkpoint in serial way, so that it would not lead to memory peak for resume from the...

I see. current sample code only shows the usage for matmul. How about conv, is there any reference code?

目前用8卡3090 instruction tuning,在256G上内存会直接爆掉。 accelerate launch --config_file configs/default_config.yaml instruction_tuning.py 这个可以限制gpu数量吗?

请问这些数据是训练的吗?如果全量finetune的时候,是不是也是要差不多? 如果采用lora finetune的话,相应的数据有吗?

batchsize 对于资源需求有比较大影响吗?比方b=1的时候和b=2的时候,相应的资源是线性增长的吗?