Liger-Kernel icon indicating copy to clipboard operation
Liger-Kernel copied to clipboard

Efficient Triton Kernels for LLM Training

Results 114 Liger-Kernel issues
Sort by recently updated
recently updated
newest added
trafficstars

## Summary Closes: https://github.com/linkedin/Liger-Kernel/issues/538 ```bash pip3 install -e .[dev] ``` Setup succeeded on my cpu and h100 machine, need help from ROCm test. ## Testing Done - Hardware Type: -...

## Summary Аdded batchNorm ## Testing Done I have compared it against Keras's batch norm.I have used 4090 - Hardware Type: - [ X] run `make test` to ensure correctness...

## Summary We need flex attention for custom attentions/masks to achieve better performance (for example, [shared prefix](https://github.com/frankxwang/dpo-prefix-sharing)) Two ways to enable flex attention in liger: 1. Set the `attn_implementation` of...

## Summary Remove redundant code by refactoring ## Testing Done - Hardware Type: - [ ] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code...

This PR adds a test for the `ref_input` parameter that was introduced in #467. ### Changes - Add `test_ref_input.py` to verify the `ref_input` parameter works correctly in `LigerFusedLinearPreferenceBase` - Test...

## Summary Implement the on-paper form of the RoPE kernel from [RoFormer](https://arxiv.org/pdf/2104.09864.).  This implementation does not support optional value input, unlike the HuggingFace [RoFormer RoPE](https://github.com/huggingface/transformers/blob/v4.46.0/src/transformers/models/roformer/modeling_roformer.py#L309) implementation. ## Details The code...

## Summary Implements `softcap` in the fused linear jsd, so it can be used for `gemma2` models ## Details Assumes same softcap for teacher and student model ## Testing Done...

## Summary Our chunked loss functions currently statically set the chunk size to 1. However, this might lead to underutilized gpu memory resources. In this PR we show how the...

## Summary Implemented a 2D batch normalization Triton operator, successfully ran the corresponding tests and benchmarks, and visualized the performance tests for speed and memory. ## Testing Done - Hardware...

## Summary Resolves #129 Add monkeypatch to support deepseepV2 model. ## Details Ops patched: - rms_norm - swiglu - cross_entropy - fused_linear_cross_entropy ## Testing Done - Hardware Type: NVIDIA A100-SXM4-40GB...