Liger-Kernel issues

Fix Setup.py about "building from source on ROCM"

6

## Summary Closes: https://github.com/linkedin/Liger-Kernel/issues/538 ```bash pip3 install -e .[dev] ``` Setup succeeded on my cpu and h100 machine, need help from ROCm test. ## Testing Done - Hardware Type: -...

hebiao064

added batch norm

## Summary Аdded batchNorm ## Testing Done I have compared it against Keras's batch norm.I have used 4090 - Hardware Type: - [ X] run `make test` to ensure correctness...

vulkomilev

Add Flex Attention Monkey Patch for LLAMA

## Summary We need flex attention for custom attentions/masks to achieve better performance (for example, [shared prefix](https://github.com/frankxwang/dpo-prefix-sharing)) Two ways to enable flex attention in liger: 1. Set the `attn_implementation` of...

austin362667

Refactor chunked preference functions and distillation base class

## Summary Remove redundant code by refactoring ## Testing Done - Hardware Type: - [ ] run `make test` to ensure correctness - [x] run `make checkstyle` to ensure code...

shivam15s

test: Add test for ref_input parameter in fused linear preference

This PR adds a test for the `ref_input` parameter that was introduced in #467. ### Changes - Add `test_ref_input.py` to verify the `ref_input` parameter works correctly in `LigerFusedLinearPreferenceBase` - Test...

xingyaoww

Add on-paper form of RoPE kernel

1

## Summary Implement the on-paper form of the RoPE kernel from [RoFormer](https://arxiv.org/pdf/2104.09864.). This implementation does not support optional value input, unlike the HuggingFace [RoFormer RoPE](https://github.com/huggingface/transformers/blob/v4.46.0/src/transformers/models/roformer/modeling_roformer.py#L309) implementation. ## Details The code...

Comet0322

Implement softcapping in fused jsd

3

## Summary Implements `softcap` in the fused linear jsd, so it can be used for `gemma2` models ## Details Assumes same softcap for teacher and student model ## Testing Done...

wheynelau

[WIP]: Autotune Chunk Size

## Summary Our chunked loss functions currently statically set the chunk size to 1. However, this might lead to underutilized gpu memory resources. In this PR we show how the...

pramodith

add batch_norm op with test and benchmark

2

## Summary Implemented a 2D batch normalization Triton operator, successfully ran the corresponding tests and benchmarks, and visualized the performance tests for speed and memory. ## Testing Done - Hardware...

yanghailong-git

[Model] DeepseekV2 Support

1

## Summary Resolves #129 Add monkeypatch to support deepseepV2 model. ## Details Ops patched: - rms_norm - swiglu - cross_entropy - fused_linear_cross_entropy ## Testing Done - Hardware Type: NVIDIA A100-SXM4-40GB...

saurabhkoshatwar

Liger-Kernel
Liger-Kernel copied to clipboard

Metadata

Fix Setup.py about "building from source on ROCM"

added batch norm

Add Flex Attention Monkey Patch for LLAMA

Refactor chunked preference functions and distillation base class

test: Add test for ref_input parameter in fused linear preference

Add on-paper form of RoPE kernel

Implement softcapping in fused jsd

[WIP]: Autotune Chunk Size

add batch_norm op with test and benchmark

[Model] DeepseekV2 Support

← Metadata

Owner

Metadata

Liger-Kernel Liger-Kernel copied to clipboard

Metadata

← Metadata

Owner

Metadata

Liger-Kernel
Liger-Kernel copied to clipboard