Roaim

Results 5 comments of Roaim

Noticed that attn_mask has shape (1, 1447, 1, 5234) while logits are (1, 1447, 8, 4096). Looks like a broadcasting mismatch between the mask and attention logits — possibly the...

Hi @Balakrishna-Chennamsetti, Noting that PR [#403](https://github.com/google-deepmind/gemma/pull/403) was created prior to this comment. It implements an improved description of system requirements in README.md, specifically regarding GPU models for different Gemma checkpoints....

@abhipatel12 More than happy to help with these updates.

@abhipatel12 I noticed "WriteBinary" is mentioned in the docs, but it doesn’t seem to exist in the code. I’ve converted it to snake_case for consistency, but I can remove it...