Liger-Kernel
Liger-Kernel copied to clipboard
Efficient Triton Kernels for LLM Training
### π Describe the bug #369 found that CrossEntropyLoss wasn't applied in post-grad-acc-fix versions of transformers. Despite the fact that #375 fixed the issue, it didn't consider the revert functions...
### π The feature, motivation and pitch The LFCE kernel allocates a `grad_weight` tensor: https://github.com/linkedin/Liger-Kernel/blob/a8fa3bb37850e89500261024ff47da0c626ab75f/src/liger_kernel/ops/fused_linear_cross_entropy.py#L47 This tensor then gets updated throughout the chunked loss calculation and finally used in the...
### π Describe the bug I am training the `meta-llama/Llama-3.2-1B` model using **LLaMA-Factory** with the following YAML configuration: ```yaml ### model model_name_or_path: meta-llama/Llama-3.2-1B ### method stage: pt do_train: true do_eval:...
### π The feature, motivation and pitch In [Accelerating Direct Preference Optimization with Prefix Sharing](https://arxiv.org/html/2410.20305v2), the authors proposed a efficient way to reduce total training tokens in paired preference optimization...
### π Describe the bug #### **Description** When using `LigerFusedLinearCrossEntropyLoss` (Liger FLCE) from the Liger kernel to replace `torch.nn.CrossEntropyLoss`, the training loss becomes unstable and diverges after reaching a certain...
### π The feature, motivation and pitch Do you consider support internlm model with liger kernel in the near future? https://huggingface.co/internlm ### Alternatives _No response_ ### Additional context _No response_
### π The feature, motivation and pitch Allow for passing a weighting tensor to weight the CEL similar to C-RLFT where some tokens or inputs in the batch may have...
### π The feature, motivation and pitch Hey team, Iβve been exploring Liger-Kernelβs optimizations for decoder models like GPT, and Iβm curious about extending these benefits to encoder models such...
Hi thanks for the library! Today I see a paper https://openreview.net/forum?id=E4Fk3YuG56 (code: https://github.com/apple/ml-cross-entropy), which seems to discuss a way to compute cross entropy. Thus I share this here in case...
### π The feature, motivation and pitch There's softcapping in the FusedLinearCrossEntropy; it would be nice to have this natively for PreferenceBase too. ### Alternatives _No response_ ### Additional context...