feat: add icon and description for Stable Diffusion benchmark
- The icon is drawn by me using Figma. We can replace it with one from a designer later.
- Please provide me a description for the Stable Diffusion benchmark (@Mostelk)
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅
Quality Gate passed
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
@AhmedTElthakeb please report number of parameters and FLOPs of the 3 models we use.
| Model Name | Parameters | MACS |
|---|---|---|
| text_encoder | 123060480 | 8.958 G |
| vae_decoder | 49490199 | 1273.718 G |
| sd_diffusion_1 | 447042560 | 147.435 G |
| sd_diffusion_2 | 412478404 | 281.060 G |
Quality Gate passed
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code
@Mostelk Please provide a description for the Stable Diffusion benchmark.
@Mostelk Please provide a description for the Stable Diffusion benchmark.
Please check this description, we reviewed it in the Wed meeting
The Text to Image Gen AI benchmark adopts Stable Diffusion v1.5 for generating images from text prompts. It is a latent diffusion model. The benchmarked Stable Diffusion v1.5 refers to a specific configuration of the model architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet,123M CLIP ViT-L/14 text encoder for the diffusion model, and VAE Decoder of 49.5M parameters. The model was trained on 595k steps at resolution of 512x512, which enables it to generate high quality images. We refer you to https://huggingface.co/benjamin-paine/stable-diffusion-v1-5 for more information. The benchmark runs 20 denoising steps for inference, and uses a precalculated time embedding of size 1x1280. Reference models can be found here https://github.com/mlcommons/mobile_open/releases For latency benchmarking, we benchmark end to end, excluding the time embedding calculation and the tokenizer. For accuracy calculations, the app adopts the CLIP metric for text-to-image consistency, and further evaluation of the generated images using this Image Quality Aesthetic Assessment metric https://github.com/idealo/image-quality-assessment/tree/master?tab=readme-ov-file
Quality Gate passed
Issues
0 New issues
0 Accepted issues
Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code