abpani

Results 23 comments of abpani

@SunMarc funny thing is it does not happen with Mistral models. it works balanced for mistral models. But with qwen, phi, llama still same issue.

> Hey @abpani, the final allocation looks very strange indeed. Can you try with device_map = "sequential" and set `max_memory` ? Also what do you mean by `it shows different...