Roaim
Roaim
Noticed that attn_mask has shape (1, 1447, 1, 5234) while logits are (1, 1447, 8, 4096). Looks like a broadcasting mismatch between the mask and attention logits — possibly the...
Hi @Balakrishna-Chennamsetti, Noting that PR [#403](https://github.com/google-deepmind/gemma/pull/403) was created prior to this comment. It implements an improved description of system requirements in README.md, specifically regarding GPU models for different Gemma checkpoints....
@abhipatel12 More than happy to help with these updates.
@abhipatel12 I noticed "WriteBinary" is mentioned in the docs, but it doesn’t seem to exist in the code. I’ve converted it to snake_case for consistency, but I can remove it...
@happyhuman happy to contribute..