Gavin Li
Gavin Li
will find sometime to add the support
fixed in airllm==2.9.1, please try it.
Weird. I just tried, the code is different from my side. Can you try again and make sure you have 2.9.1? `pip list | grep airllm`
是的 这两个参数名有点confusing,我回头改一下
> 另外,我看到计算loss的时候,是把prompt + query + response都算上了,为什么呢?我们以为只要算response的loss,。 这个大部分情况对performance影响不大。回头我可以加一个参数可以disable instruction。
推理的话我们用的是H100/A100 80G。别的硬件还没有测试过。用H100或者A100很快。 vllm理论上是没问题的。我回头可以测试一下。但是我理解vllm是优化throughput,看你需要优化的是throughput还是latency。
> The changes for flux2 caused a slight conflict, could you take a look? Thanks! Sure. I've resolved the conflicts. Thanks.
I think with LORAs the data distribution might be changed dramatically, and the int4 quantization is too sensitive to outliers. So maybe directly loading LORAs may cause the error accumulation...