Liger-Kernel
Liger-Kernel copied to clipboard
Need to investigate Gemma3 implementation with Liger
🐛 Describe the bug
The tolerance when comparing loss in gemma3 multimodal model need to be set high (atol,rtol - 1e-3) compare to others (atol=1e-8,rtol=1e-5) in order to pass the tests. Similar for gemma3_text, need to set (atol = 3e-1, rtol = 4e-1) to pass the tests when comparing top 20 log probs.
Reproduce
No response
Versions
Operating System: Linux-5.15.180.1-1.cm2-x86_64-with-glibc2.35 Python version: 3.10.14 Liger Kernel version: 0.5.10 PyTorch version: 2.7.1+cu126 CUDA version: 12.6 Triton version: 3.3.1 Transformers version: 4.52.4