TensorRT-LLM
TensorRT-LLM copied to clipboard
Support SDXL and its distributed inference
The idea of patch parallelism comes from the CVPR 2024 paper Distrifusion. In order to reduce the difficulty of implementation, all communications in the example are synchronous.
This can help SDXL achieve better performance, especially when the resolution is very high
A100, 50 steps, 2048x2048, SDXL
Framework | sync_mode | n_gpu | latency(s) | speed_up | memory(MiB) |
---|---|---|---|---|---|
Torch | - | 1 | 25.25 | 1x | 42147 |
TRT | - | 1 | 21.98 | 1.15x | 42895 |
DistrFusion(Torch) | split_batch | 2 | 13.33 | 1.89x | 40173 |
Ours | split_batch | 2 | 11.69 | 2.16x | 42675 |
DistrFusion(Torch) | corrected_async_gn | 4 | 8.27 | 3.05x | 49087 |
DistrFusion(Torch) | full_sync | 4 | 8.64 | 2.92x | 51943 |
Ours | full_sync | 4 | 7.73 | 3.27x | 43073 |
@Zars19 thanks for the contribution to TensorRT-LLM!
@nv-guomingz can you help take care of this? :)
Thanks June
@Zars19 thanks for the contribution to TensorRT-LLM!
@nv-guomingz can you help take care of this? :)
Thanks June
Sure, I'll collobrate with @Zars19 for enabling SDXL with TRT-LLM.
Hi @Zars19 , could u please resolve the code conflicts firstly?
Hi @Zars19 , could u please resolve the code conflicts firstly?
I have resolved the conflict :) @nv-guomingz
Hi @Zars19 thanks for your patience. Could u please update this MR by updating/rebasing those two commit(including one merge commit) into one commit which make us easy to integrate and testing?
@nv-guomingz I completed the git rebase
Any updates on the code review?
Any updates on the code review?
After rebasing the code, I haven't received feedback for a while now @nv-guomingz @juney-nvidia