tomesd
tomesd copied to clipboard
No reduction in graphics memory
1、GPU:A100-80G,
test-code:

2、result

3、conclusion and problem (1)tomesd:token_ merge_ The higher the ratio, the faster the inference speed, 8.71s ->5.92s (ratio: 0->ratio: 0.5),however,there has been no significant change in the graphics memory, which is not consistent with the paper. I would like to understand the possible reasons??? (2)The inference speed of xforms+tomesd [token merge ratio>0.3] is better than using xforms alone。
All the benchmarks in the paper were done using the original stable diffusion repo, not diffusers (and diffusers may give different results). Also, it's hard to get an accurate memory usage reading with pytorch. The way I estimated it in the paper using the stable diffusion repo was I used 512x512 images and then increased the batch size until I ran out of memory. Then the memory per image was the total memory allocated divided by the maximum batch size before running out of memory.
If you don't do it like that, the problem is that pytorch will very often allocate more memory than it needs, and it's hard to find out how much memory the network is actually using.