gladichl

Results 2 comments of gladichl

Does this method split a model (like HiDream)? For example, instead of 24Gb could it work on 12Gb+12Gb theoretically (with quantization and resolution reductuon)? Or it's just a sequential parallelism?

> this requires high bandwidth between GPUs, otherwise @Nerogar 's very efficient async RAM offloading implementation is just better. if you have very high bandwidth between GPUs such as NVLINK,...