mobile_app_open feat: add icon and description for Stable Diffusion benchmark

The icon is drawn by me using Figma. We can replace it with one from a designer later.
Please provide me a description for the Stable Diffusion benchmark (@Mostelk)

Sep 12 '24 06:09 anhappdev

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Sep 12 '24 06:09 github-actions[bot]

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

Sep 12 '24 06:09 sonarqubecloud[bot]

@AhmedTElthakeb please report number of parameters and FLOPs of the 3 models we use.

Sep 24 '24 05:09 freedomtan

Model Name Parameters MACS

text_encoder 123060480 8.958 G

vae_decoder 49490199 1273.718 G

sd_diffusion_1 447042560 147.435 G

sd_diffusion_2 412478404 281.060 G

Model Name	Parameters	MACS
text_encoder	123060480	8.958 G
vae_decoder	49490199	1273.718 G
sd_diffusion_1	447042560	147.435 G
sd_diffusion_2	412478404	281.060 G

Oct 01 '24 04:10 AhmedTElthakeb

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

Oct 22 '24 11:10 sonarqubecloud[bot]

@Mostelk Please provide a description for the Stable Diffusion benchmark.

Oct 24 '24 09:10 anhappdev

@Mostelk Please provide a description for the Stable Diffusion benchmark.

Please check this description, we reviewed it in the Wed meeting

The Text to Image Gen AI benchmark adopts Stable Diffusion v1.5 for generating images from text prompts. It is a latent diffusion model. The benchmarked Stable Diffusion v1.5 refers to a specific configuration of the model architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet,123M CLIP ViT-L/14 text encoder for the diffusion model, and VAE Decoder of 49.5M parameters. The model was trained on 595k steps at resolution of 512x512, which enables it to generate high quality images. We refer you to https://huggingface.co/benjamin-paine/stable-diffusion-v1-5 for more information. The benchmark runs 20 denoising steps for inference, and uses a precalculated time embedding of size 1x1280. Reference models can be found here https://github.com/mlcommons/mobile_open/releases For latency benchmarking, we benchmark end to end, excluding the time embedding calculation and the tokenizer. For accuracy calculations, the app adopts the CLIP metric for text-to-image consistency, and further evaluation of the generated images using this Image Quality Aesthetic Assessment metric https://github.com/idealo/image-quality-assessment/tree/master?tab=readme-ov-file

Dec 12 '24 00:12 Mostelk

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Dec 12 '24 08:12 sonarqubecloud[bot]