gspeter-max

Results 16 comments of gspeter-max

```python import torch def prepare_fa2_from_position_ids(query, key, value, position_ids): query = query.view(-1, query.size(-2), query.size(-1)) key = key.contiguous().view(-1, key.size(-2), key.size(-1)) value = value.contiguous().view(-1, value.size(-2), value.size(-1)) # This is YOUR logic we are...

Hi,@zucchini-nlp I found a bug in the function — it returns incorrect output. I’m creating a PR to fix it, but I don’t have access to the model weights to...

https://github.com/huggingface/transformers/blob/d5d007a1a0f0c11a726a54c8f00bd71825f84d02/src/transformers/modeling_flash_attention_utils.py#L206 # if doc is correct position_ids (`torch.Tensor`): Boolean or int tensor of shape (batch_size, sequence_length), 1 means valid and 0 means not valid.

Yeah exactly, but I am mostly talking about the behaviour of this position_id The doc says this is like a attention mask but actually in the real world I think...

same as i get this error with qwen 3B model but after changing the model i complete kill this error

Thanks a lot for verifying @jiaqiw09! Great to hear it works well on Ascend NPUs with torch 2.1.0 and torch_npu 2.1.0. Let me know if there’s anything else needed from...

Can you give me the exact location.where you like to see this function ? Cc @SunMarc

i think is better ```python def __add_prefix_to_device(param_device: Union[str, torch.device, int], param_name: str) -> str: """ Takes an integer device and adds the correct hardware prefix (e.g., 'npu:', 'mlu:') to it....

Thanks you You guide me for this thank a lot 🫡 We are need to change something