Yu Zhang comments

Results 89 comments of


                                            Yu Zhang

Chunk wise linear attn kernel does not work with torch compile (returns incorrects values / NaNs)

@juankost closing this as answered by the link you provided. We've not experimented too much with `torch.compile` yet. Any PRs are welcome if you fix this issue.

Transformer model not learning after adding a classification head

@OREYR Hi, not knowing what happened, can you provide more details about exp settings, scheduler, data, model framework, etc.

Transformer model not learning after adding a classification head

@OREYR Does that mean you randomly init your model again? For newly init models, lr of `1e-5` is too small.

Transformer model not learning after adding a classification head

Thank you for reporting this. I will have a check.

Transformer model not learning after adding a classification head

@OREYR looks like you wrap the classifier with LoRA as well, and the orginal random params are freezed?

Transformer model not learning after adding a classification head

@OREYR one thing to confirm: how is MLP called in your peft modules? I wrote some fused kernels in this module to save mems, so please check the impls to...

Transformer model not learning after adding a classification head

@OREYR Can you paste the full runnable script from which I can observe the abnormal values here?

Hello from HF Diffusers

@sayakpaul FYI, we release some weights converted from Mistral-7B-v0.1 as in [arXiv:2405.06640](https://arxiv.org/abs/2405.06640). You can have a try by loading `fla-hub/gla-7B-mistral-20B`, `fla-hub/gsa-7B-mistral-20B` or `fla-hub/gsa-7B-mistral-100B`

Lack of speed advantage in GLA training

@Yingyue-L Hi, refer to #32 for more throughput comparisons. You may need a larger seq_len to fully unlock the potentials of linear attns (LA), as shown in [DeltaNet](https://arxiv.org/abs/2406.06484), LAs do...

Current FLA RWKV6 implementation has significant precision issues in pure bf16 mode

@howard-hou Thanks for reporting this issue, we will have a check soon