Zhiyuan Li comments

Results 60 comments of


                                            Zhiyuan Li

[Feature Request]

@conceptofmind Hi, I will close this issue since we have implemented most of your request, feel free to open a new PR if you have any issue. https://github.com/fla-org/flash-linear-attention/blob/main/fla/modules/activations.py

[feat] Add support for Huawei WebView / HarmonyOS

I see that everyone is talking about HOS and I think it's time to directly support the next system as Huawei may export its latest next system capabilities to overseas...

Could not running simple example on XPU

I have the same issue, however I'm using the pip version. result = torch.matmul(x, x_trans) Traceback (most recent call last): File "", line 1, in RuntimeError: could not create an...

Could not running simple example on XPU

Yes, but the installation drivers and the associated one API in the documentation will be installed to this version on the premise of a fresh install (their dependencies are allowed)...

[Bug] [RWKV7] `fuse_norm` cause performance drop in throughput

> [@Triang-jyed-driung](https://github.com/Triang-jyed-driung) Hi, I think a better way is to modify the config file for the pretrained ckpt. Have you tried > > AutoModelForCausalLM.from_pretrained(args.model_name, trust_remote_code=True, fuse_norm=True) Yes, The `fuse_norm` configuration...

[Bug] Throughput benchmarking script fails

Maybe related to https://github.com/fla-org/flash-linear-attention/pull/401

Add more accelerators for learning

Hi, I have a PR: https://github.com/Lightning-AI/pytorch-lightning/pull/20349/ This PR allows registration of third-party plugins with minimal changes, instead of integrating third-party devices directly into lightning. I think this can reduce the...

[WIP]add npu support

> > @hipudding @hhllxx1121 More and more device manufacturers need to support their own backends. Can this be achieved by using the PrivateUse1 mechanism? > > As far as I...

[Bug] RoPE attention encounters illegal memory for long sequence decoding

I think it's because `tl.int32`? How could I reproduce this error

[RFC] Use each model's official initialization instead of a unified initialization

Changed [RWKV7](https://github.com/fla-org/flash-linear-attention/pull/365) to official initialization