Why is vae decoder so slow? Can you help me?
Seems like most of the delay is actually synchronization, which sort of implies that the slowdown is actually something else in the code prior. The way torch works is each op actually runs asynchronously, with each next op getting pushed to the gpu to run at a later time. Since synchronization is what is taking the longest time, try synchronizing before the decode and then run the decode step to ensure that it's not something else.
Actulally, it could be because you may have set autoencoder offloading to true, so- in that case it could be that the slowdown is moving the vae to gpu, encoding, and then moving the vae back to the cpu.
I tried not to uninstall the autoencoder, but found that the speed of the decoder is still the same slow. The main part of speed encoding is upsampling. It takes 4 to 5 seconds,My test machine is L4
for i_level in reversed(range(self.num_resolutions)): for i_block in range(self.num_res_blocks + 1): h = self.up[i_level].block[i_block](h) if len(self.up[i_level].attn) > 0: h = self.up[i_level].attn[i_block](h) if i_level != 0: h = self.up[i_level].upsample(h)
I'm not entirely sure what the slowdown would be- though an L4 has pretty low wattage limits so it might be related it throttling because of wattage limits. I would check the clock speeds as it's decoding, check to see whether they drop significantly.