gspeter-max
gspeter-max
ok fine thanks for this @zucchini-nlp
```python import torch def prepare_fa2_from_position_ids(query, key, value, position_ids): query = query.view(-1, query.size(-2), query.size(-1)) key = key.contiguous().view(-1, key.size(-2), key.size(-1)) value = value.contiguous().view(-1, value.size(-2), value.size(-1)) # This is YOUR logic we are...
Hi,@zucchini-nlp I found a bug in the function — it returns incorrect output. I’m creating a PR to fix it, but I don’t have access to the model weights to...
https://github.com/huggingface/transformers/blob/d5d007a1a0f0c11a726a54c8f00bd71825f84d02/src/transformers/modeling_flash_attention_utils.py#L206 # if doc is correct position_ids (`torch.Tensor`): Boolean or int tensor of shape (batch_size, sequence_length), 1 means valid and 0 means not valid.
Yeah exactly, but I am mostly talking about the behaviour of this position_id The doc says this is like a attention mask but actually in the real world I think...
same as i get this error with qwen 3B model but after changing the model i complete kill this error
Thanks a lot for verifying @jiaqiw09! Great to hear it works well on Ascend NPUs with torch 2.1.0 and torch_npu 2.1.0. Let me know if there’s anything else needed from...
Can you give me the exact location.where you like to see this function ? Cc @SunMarc
i think is better ```python def __add_prefix_to_device(param_device: Union[str, torch.device, int], param_name: str) -> str: """ Takes an integer device and adds the correct hardware prefix (e.g., 'npu:', 'mlu:') to it....
Thanks you You guide me for this thank a lot 🫡 We are need to change something