executorch icon indicating copy to clipboard operation
executorch copied to clipboard

use --use_sdpa_with_kv_cache for 1B/3B bf16

Open helunwencser opened this issue 4 months ago • 2 comments

Stack from ghstack (oldest at bottom):

  • -> #5861

We should use this option during exporting 1B/3B models as bf16 because KVCache is always fp32. Otherwise, we see regressed performance for 1B/3B in bf16 format.

Differential Revision: D63871048

helunwencser avatar Oct 03 '24 23:10 helunwencser