松本和真

Results 14 comments of 松本和真

I am coding now, but it's first time I contribute transformers and other OSS. I may ask you some help.

I still have a error located in modeling_diffllama.py@377: apply_rotary_pos_emb. Var "query_states" must be torch.Size([2, 32, 10, 128]) but the var is torch.Size([2, 64, 10, 64]). I need to change "query_states"...

I've finished making normal/eager Attention, and I can run with AutoModelforForCausalLM.generate(). But I'll adapt it for FlashAttention2 and Sdpa Attention.

And also I fixed to fit modular transfomres.

@bzantium I found Attention missed implemented from paper still on e072544a3bfc69b8a903e062729f861108ffecd3. So I'll revert to e072544a3bfc69b8a903e062729f861108ffecd3 and re-implement with your suggested code style.

if commit about FlashAttention2 is imported: DiffLlamaAttention and DiffLlamaSdpaAttention output the same tensor. DiffLlamaFlashAttention2 cannot work alone, just as LlamaFlashAttention2 (original) cannot work alone. But DiffLlamaForCausalLM with eager and one...

Thanks for re-implementing. I had checked DiffLlama Model with these 3 attentions output about the same results. I think because of little difference of libraries, flash-attn and pytorch.nn.functional, they don't...

Sorry, I think I was wrong and your interpretation is correct. I'll revert.

To pass the test of test_initialization and test_mismatched_shapes_have_properly_initialized_weights, I want to change/add to the code of `tests/test_modeking_common.py`. But this is common code. Could I change/add to the code like below?...

All tests passed other than `tests/utils/test_modeling_utils.py::ModelUtilsTest::test_generation_config_is_loaded_with_model`, unrelated to adding this model. Please review this PR again? And could you tell me how to fix the error? to: @Cyrilvallez