stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

LDM optimization patches

Open drhead opened this issue 1 year ago • 4 comments

Description

Change 1: Timestep Embedding Patch

  • Fixes a blocking op in the timestep embedding. It was creating a tensor on CPU and then moving it to GPU, which would force a sync every step.
  • Combined with the other performance PRs (mine and HCL's), Torch's dispatch queue should be completely unblocked (until extensions with similar problems mess it up). This will allow near constant 100% GPU usage.

Change 2: SpatialTransformer.forward einops removal

  • Changes the function to use native torch reshape/view/permute ops and removes the .contiguous() call.
  • Prevents 32 calls to aten::copy_ and void at::native::elementwise_kernel<128, 4, at::nati... per forward pass (SD 1.5). Speedup seems to be around 6-8 ms per forward, but my profiler is being a little inconsistent with the timing (512x512, batch 4, overclocked 3090)

Checklist:

drhead avatar May 17 '24 16:05 drhead

I think #18620 might need to be merged before tests will pass on this.

drhead avatar May 17 '24 16:05 drhead

  • we are currently on #15824

so we need to wait 2769 new posts to merge this 🙃

w-e-w avatar May 17 '24 16:05 w-e-w

Upon further review I think it would be sufficient for #15820 to be merged first lol

drhead avatar May 17 '24 16:05 drhead

Added another patch, and it passes tests now.

drhead avatar May 17 '24 17:05 drhead