Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

add batch_norm op with test and benchmark

Open yanghailong-git opened this issue 9 months ago • 2 comments

Summary

Implemented a 2D batch normalization Triton operator, successfully ran the corresponding tests and benchmarks, and visualized the performance tests for speed and memory.

Testing Done

  • Hardware Type: <BLANK>
  • [x] run make test to ensure correctness
  • [x] run make checkstyle to ensure code style
  • [x] run make test-convergence to ensure convergence

the visualization of performance: batch_norm_speed batch_norm_memory

yanghailong-git avatar Feb 07 '25 13:02 yanghailong-git

looks like from the benchmark result triton impl is slower than HF original one? 👀

yundai424 avatar Feb 11 '25 06:02 yundai424

looks like from the benchmark result triton impl is slower than HF original one? 👀

It seems so. The memory usage is about the same, but the speed is a bit slower. Do you have any optimization or improvement methods?

yanghailong-git avatar Feb 12 '25 02:02 yanghailong-git