daejin comments

Repositories
Issues
Comments

Results 1 comments of


                                            daejin

trafficstars

Inconsistent 'query_pre_attn_scalar' Setting Between 9B and 27B Models

I'm looking for clarification on why the `query_pre_attn_scalar` value was changed from 224 (`d_model` / `# heads`) to `head_dim` 256 specifically for the 9B model in the [latest commit](https://github.com/google/gemma_pytorch/commit/03e657582d17cb5a8617ebf333c1c16f3694670e), while...