Liger-Kernel
Liger-Kernel copied to clipboard
Implement softcapping in fused jsd
Summary
Implements softcap in the fused linear jsd, so it can be used for gemma2 models
Details
Assumes same softcap for teacher and student model
Testing Done
-
added tests for softcapping in
test_fused_linear_jsd.py -
Hardware Type: L40S
-
[x] run
make testto ensure correctness -
[x] run
make checkstyleto ensure code style -
[ ] run
make test-convergenceto ensure convergence
@wheynelau i just fixed some conflict due to out of sync 😃 FYI test_correctness_functional is failing for me on A100 when softcap=50.0. Raw output here: https://gist.github.com/yundai424/7bbfa78f05667749ce189cd458cabf90
@yundai424 Okay thanks! Let me take a look at this
@yundai424 Have updated it!