松本和真 comments

Results 14 comments of


                                            松本和真

[WIP] Add diffllama

I am coding now, but it's first time I contribute transformers and other OSS. I may ask you some help.

I still have a error located in modeling_diffllama.py@377: apply_rotary_pos_emb. Var "query_states" must be torch.Size([2, 32, 10, 128]) but the var is torch.Size([2, 64, 10, 64]). I need to change "query_states"...

[WIP] Add diffllama

I've finished making normal/eager Attention, and I can run with AutoModelforForCausalLM.generate(). But I'll adapt it for FlashAttention2 and Sdpa Attention.

[WIP] Add diffllama

And also I fixed to fit modular transfomres.

[WIP] Add diffllama

@bzantium I found Attention missed implemented from paper still on e072544a3bfc69b8a903e062729f861108ffecd3. So I'll revert to e072544a3bfc69b8a903e062729f861108ffecd3 and re-implement with your suggested code style.

[WIP] Add diffllama

if commit about FlashAttention2 is imported: DiffLlamaAttention and DiffLlamaSdpaAttention output the same tensor. DiffLlamaFlashAttention2 cannot work alone, just as LlamaFlashAttention2 (original) cannot work alone. But DiffLlamaForCausalLM with eager and one...