keras icon indicating copy to clipboard operation
keras copied to clipboard

`mixed_bfloat16` in TPU is slower than `float32`

Open chenmoneygithub opened this issue 2 years ago • 3 comments

In short, we observed mixed_bfloat16 in TPU is slower than float32 in our model benchmarks. Please refer to this sheet (internal only) for comparison results.

To reproduce in JAX backend, on TPU VM, use the command below:

cd benchmarks
KERAS_BACKEND=jax python3 -m model_benchmark.image_classification_benchmark  \
   --model="ResNet50V2"  \
   --epochs=1 \
   --batch_size=32 \ 
   --mixed_precision_policy="mixed_bfloat16"

To reproduce in TF backend, you need to modify the code to connect to TPU and use a TPU strategy.

chenmoneygithub avatar Jul 07 '23 19:07 chenmoneygithub

Hi @chenmoneygithub -

The sheet is accessible for me. Mixed precision will speedup will only speed up models on recent NVIDIA GPUs and Google TPUs. NVIDIA GPUs support using a mix of float16 and float32, while TPUs support a mix of bfloat16 and float32. More details you can find here.

On which hardware you are using mixed_bfloat16 and float32 ?

mehtamansi29 avatar Sep 13 '24 18:09 mehtamansi29

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar Sep 28 '24 02:09 github-actions[bot]

@mehtamansi29 Thanks for looking into that! I am not sure if the result is still valid, that's a benchmark I did before the first official release of Keras 3. The TPU was v3-8, which is a very old distribution as of today.

chenmoneygithub avatar Sep 29 '24 00:09 chenmoneygithub

Hi @chenmoneygithub - Since the benchmark was done before the release of Keras 3 and on TPU v3-8, the results may not reflect current performance. Also, the sheet is not accessible for me, could you please share a brief summary of the comparison results? Thanks!

sonali-kumari1 avatar Sep 01 '25 06:09 sonali-kumari1

This issue is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

github-actions[bot] avatar Sep 16 '25 02:09 github-actions[bot]

This issue was closed because it has been inactive for 28 days. Please reopen if you'd like to work on this further.

github-actions[bot] avatar Oct 01 '25 02:10 github-actions[bot]

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Oct 01 '25 02:10 google-ml-butler[bot]