phalexo
phalexo
It may be helpful for people to know that different stages can be put on different GPUs, for those who have GPUs with smaller VRAM but several of them.
If by lighter you mean requiring less VRAM, then you can try to set the dtype to float16. T5 is then about 11.6GiB.
T5 needs about 11.6GiB If_I needs about 9.2GiB if_II + if_III need about 5.8GiB, each separately about 3GiB
If stages are not callable then it is not clear to me which parts are callable. Did you manage to compile anything? On Mon, May 15, 2023, 6:46 PM Mohsen...
I just want the inference to be faster. Can you paste the snippet of code that worked for you? On Mon, May 15, 2023, 7:45 PM Mohsen Sadeghi ***@***.***> wrote:...
Regardless where I put it, either it complains about parallelism or simply hangs. What was the specific location where you put it "torch.compile" or "@torch.compile" ?
Not quite the crisp image above. 

This kind of looks ok. 
> @phalexo; I'm running all the full models in full resolution on a 48G VRAM RTX A6000 instance on RunPod[1]. What are you using? > > [1] This is not...