hihiruby

Results 1 comments of hihiruby

Add pad_to_multiple_of=8, most tokenizers (and Gemma’s processor) support pad_to_multiple_of. That padding eliminates the misalignment so FlashAttention can run without crashing — and it uses far less memory than the math...