Liger-Kernel issues

`revert_liger_kernel_to_xxx` can't revert LigerCrossEntropyLoss for transformers>=4.46.1

7

### 🐛 Describe the bug #369 found that CrossEntropyLoss wasn't applied in post-grad-acc-fix versions of transformers. Despite the fact that #375 fixed the issue, it didn't consider the revert functions...

Tcc0403

bug

Gradient checkpointing for `grad_weight` in LFCE

4

### 🚀 The feature, motivation and pitch The LFCE kernel allocates a `grad_weight` tensor: https://github.com/linkedin/Liger-Kernel/blob/a8fa3bb37850e89500261024ff47da0c626ab75f/src/liger_kernel/ops/fused_linear_cross_entropy.py#L47 This tensor then gets updated throughout the chunked loss calculation and finally used in the...

cassanof

No Significant Improvement Observed in Model Training Speed

8

### 🐛 Describe the bug I am training the `meta-llama/Llama-3.2-1B` model using **LLaMA-Factory** with the following YAML configuration: ```yaml ### model model_name_or_path: meta-llama/Llama-3.2-1B ### method stage: pt do_train: true do_eval:...

lianghsun

Potential Optimization for Preference Training with Prefix Sharing

### 🚀 The feature, motivation and pitch In [Accelerating Direct Preference Optimization with Prefix Sharing](https://arxiv.org/html/2410.20305v2), the authors proposed a efficient way to reduce total training tokens in paired preference optimization...

austin362667

`LigerFusedLinearCrossEntropyLoss` Causes Training Loss to Diverge After Reaching ~8

7

### 🐛 Describe the bug #### **Description** When using `LigerFusedLinearCrossEntropyLoss` (Liger FLCE) from the Liger kernel to replace `torch.nn.CrossEntropyLoss`, the training loss becomes unstable and diverges after reaching a certain...

penghui-yang

Consider support liger kernel for internlm model

### 🚀 The feature, motivation and pitch Do you consider support internlm model with liger kernel in the near future? https://huggingface.co/internlm ### Alternatives _No response_ ### Additional context _No response_

14H034160212

Weighted Cross Entropy Loss

4

### 🚀 The feature, motivation and pitch Allow for passing a weighting tensor to weight the CEL similar to C-RLFT where some tokens or inputs in the batch may have...

winglian

Extending Liger-Kernel Optimizations to Encoder Models Like BER

### 🚀 The feature, motivation and pitch Hey team, I’ve been exploring Liger-Kernel’s optimizations for decoder models like GPT, and I’m curious about extending these benefits to encoder models such...

pengzhangzhi

Apple's cross entropy computation

8

Hi thanks for the library! Today I see a paper https://openreview.net/forum?id=E4Fk3YuG56 (code: https://github.com/apple/ml-cross-entropy), which seems to discuss a way to compute cross entropy. Thus I share this here in case...

fzyzcjy

PreferenceBase with Softcapping?

2

### 🚀 The feature, motivation and pitch There's softcapping in the FusedLinearCrossEntropy; it would be nice to have this natively for PreferenceBase too. ### Alternatives _No response_ ### Additional context...

cinjon

Liger-Kernel
Liger-Kernel copied to clipboard

Metadata

`revert_liger_kernel_to_xxx` can't revert LigerCrossEntropyLoss for transformers>=4.46.1

Gradient checkpointing for `grad_weight` in LFCE

No Significant Improvement Observed in Model Training Speed

Potential Optimization for Preference Training with Prefix Sharing

`LigerFusedLinearCrossEntropyLoss` Causes Training Loss to Diverge After Reaching ~8

Consider support liger kernel for internlm model

Weighted Cross Entropy Loss

Extending Liger-Kernel Optimizations to Encoder Models Like BER

Apple's cross entropy computation

PreferenceBase with Softcapping?

← Metadata

Owner

Metadata

Liger-Kernel Liger-Kernel copied to clipboard

Metadata

← Metadata

Owner

Metadata

Liger-Kernel
Liger-Kernel copied to clipboard