flash-linear-attention icon indicating copy to clipboard operation
flash-linear-attention copied to clipboard

[RFC] Use each model's official initialization instead of a unified initialization

Open sustcsonglin opened this issue 8 months ago • 2 comments

Proposal

Use each model's official initialization instead of a unified initialization

Rationale

Related issue https://github.com/fla-org/flash-linear-attention/issues/220 , https://github.com/fla-org/flash-linear-attention/issues/266

sustcsonglin avatar Apr 01 '25 02:04 sustcsonglin

Consider restructuring the code: Use base class FLAModel, FLAForCausalLM which provide the default initialization and forward. Downstream models inherit these base classes and override these functions when necessary.

Triang-jyed-driung avatar Apr 05 '25 04:04 Triang-jyed-driung

Changed RWKV7 to official initialization

zhiyuan1i avatar Apr 17 '25 11:04 zhiyuan1i