mobile_app_open icon indicating copy to clipboard operation
mobile_app_open copied to clipboard

feat: add icon and description for Stable Diffusion benchmark

Open anhappdev opened this issue 1 year ago • 3 comments

  • The icon is drawn by me using Figma. We can replace it with one from a designer later.
  • Please provide me a description for the Stable Diffusion benchmark (@Mostelk)

anhappdev avatar Sep 12 '24 06:09 anhappdev

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

github-actions[bot] avatar Sep 12 '24 06:09 github-actions[bot]

@AhmedTElthakeb please report number of parameters and FLOPs of the 3 models we use.

freedomtan avatar Sep 24 '24 05:09 freedomtan

Model Name Parameters MACS
text_encoder 123060480 8.958 G
vae_decoder 49490199 1273.718 G
sd_diffusion_1 447042560 147.435 G
sd_diffusion_2 412478404 281.060 G

AhmedTElthakeb avatar Oct 01 '24 04:10 AhmedTElthakeb

@Mostelk Please provide a description for the Stable Diffusion benchmark.

anhappdev avatar Oct 24 '24 09:10 anhappdev

@Mostelk Please provide a description for the Stable Diffusion benchmark.

Please check this description, we reviewed it in the Wed meeting

The Text to Image Gen AI benchmark adopts Stable Diffusion v1.5 for generating images from text prompts. It is a latent diffusion model. The benchmarked Stable Diffusion v1.5 refers to a specific configuration of the model architecture that uses a downsampling-factor 8 autoencoder with an 860M UNet,123M CLIP ViT-L/14 text encoder for the diffusion model, and VAE Decoder of 49.5M parameters. The model was trained on 595k steps at resolution of 512x512, which enables it to generate high quality images. We refer you to https://huggingface.co/benjamin-paine/stable-diffusion-v1-5 for more information. The benchmark runs 20 denoising steps for inference, and uses a precalculated time embedding of size 1x1280. Reference models can be found here https://github.com/mlcommons/mobile_open/releases For latency benchmarking, we benchmark end to end, excluding the time embedding calculation and the tokenizer. For accuracy calculations, the app adopts the CLIP metric for text-to-image consistency, and further evaluation of the generated images using this Image Quality Aesthetic Assessment metric https://github.com/idealo/image-quality-assessment/tree/master?tab=readme-ov-file

Mostelk avatar Dec 12 '24 00:12 Mostelk