Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

Implement softcapping in fused jsd

Open wheynelau opened this issue 1 year ago • 3 comments
trafficstars

Summary

Implements softcap in the fused linear jsd, so it can be used for gemma2 models

Details

Assumes same softcap for teacher and student model

Testing Done

  • added tests for softcapping in test_fused_linear_jsd.py

  • Hardware Type: L40S

  • [x] run make test to ensure correctness

  • [x] run make checkstyle to ensure code style

  • [ ] run make test-convergence to ensure convergence

wheynelau avatar Nov 21 '24 03:11 wheynelau

@wheynelau i just fixed some conflict due to out of sync 😃 FYI test_correctness_functional is failing for me on A100 when softcap=50.0. Raw output here: https://gist.github.com/yundai424/7bbfa78f05667749ce189cd458cabf90

yundai424 avatar Nov 22 '24 01:11 yundai424

@yundai424 Okay thanks! Let me take a look at this

wheynelau avatar Nov 22 '24 01:11 wheynelau

@yundai424 Have updated it!

wheynelau avatar Nov 22 '24 12:11 wheynelau