Inference on TPUs instead of GPUs.
Hi folks! Our AI Hypercomputer team ported Flux inference implementation to MaxDiffusion and were able to successfully run both Flux-dev and Flux-schnell models using Google's TPUs.
Running tests on 1024 x 1024 images with flash attention and bfloat16 gave the following results:
| Model | Accelerator | Sharding Strategy | Batch Size | Steps | time (secs) |
|---|---|---|---|---|---|
| Flux-dev | v4-8 | DDP | 4 | 28 | 23 |
| Flux-schnell | v4-8 | DDP | 4 | 4 | 2.2 |
| Flux-dev | v6e-4 | DDP | 4 | 28 | 5.5 |
| Flux-schnell | v6e-4 | DDP | 4 | 4 | 0.8 |
| Flux-schnell | v6e-4 | FSDP | 4 | 4 | 1.2 |
We'd appreciate if you could give us some feedback on the above-mentioned results and our overall approach.
Hi folks! Our AI Hypercomputer team ported Flux inference implementation to MaxDiffusion and were able to successfully run both Flux-dev and Flux-schnell models using Google's TPUs.
Running tests on 1024 x 1024 images with flash attention and bfloat16 gave the following results:
Model Accelerator Sharding Strategy Batch Size Steps time (secs) Flux-dev v4-8 DDP 4 28 23 Flux-schnell v4-8 DDP 4 4 2.2 Flux-dev v6e-4 DDP 4 28 5.5 Flux-schnell v6e-4 DDP 4 4 0.8 Flux-schnell v6e-4 FSDP 4 4 1.2 We'd appreciate if you could give us some feedback on the above-mentioned results and our overall approach.
Hello, as a beginner, this has been very informative. I don’t have any prior experience with PyTorch, Diffusers, or similar frameworks, and I couldn’t find any clear documentation on how to run open-source image generation models like Flux Dev on TPUs.
On Google Cloud Platform, a single H100 (spot instance) costs around $1,800 per month, while a v6e-4 TPU instance (I assume this means 4 TPU chips) costs about $1,900 per month.
I’m currently trying to learn how to build my own image generation infrastructure. It’s a very interesting area. However, I’d like to hear your thoughts on what would be the best instance configuration for running image generation workloads in a setup like this.
Do you think using an H100 GPU instance (1 GPU) (around $1,800/month) would be a better choice than using a v6e TPU instance, in terms of performance and practicality for this type of architecture?
Thank you.