leiwen83
leiwen83
Hi @robertgshaw2-neuralmagic @cadedaniel , How is going with the spec related metric, have we got the conclusion for how to make it happen? ;) The metric is critical to us...
@simon-mo @robertgshaw2-neuralmagic RFC https://github.com/vllm-project/vllm/issues/4873 is created
> Scope or not, there's no point to porting over FastChat's Python controller implementation, it's literally 1000x slower at scale than 1 day's worth of Rust code. Yep, rust also...
> I'm using fastchat previously, and now plan to use vllm and Ray serve for LLM inference, seems it's also working well. So ray-llm is not my dependent project now...
fastapi change may not be enough... For fastchat, it implement controller which track status of all workers, which make registry possible.
@Dominic789654 you may try my latest PR https://github.com/microsoft/DeepSpeed/pull/3629 This patch would allow loading checkpoint in serial way, so that it would not lead to memory peak for resume from the...
I see. current sample code only shows the usage for matmul. How about conv, is there any reference code?
目前用8卡3090 instruction tuning,在256G上内存会直接爆掉。 accelerate launch --config_file configs/default_config.yaml instruction_tuning.py 这个可以限制gpu数量吗?
请问这些数据是训练的吗?如果全量finetune的时候,是不是也是要差不多? 如果采用lora finetune的话,相应的数据有吗?
batchsize 对于资源需求有比较大影响吗?比方b=1的时候和b=2的时候,相应的资源是线性增长的吗?