mobicham

Results 113 comments of mobicham

@i3hz I will also do some debugging next week

@i3hz I tried the slicing solution but it throws an attention error even without torch.compile: ``` RuntimeError: The size of tensor a (4) must match the size of tensor b...

So this issue is this part https://github.com/huggingface/transformers/blob/7f5c20945a97ed960eb85d96b93c89f33772fd20/src/transformers/models/whisper/modeling_whisper.py#L329-L330 if you replace it with this, it works. ```Python key_states = past_key_values.layers[self.layer_idx].keys[:bsz] value_states = past_key_values.layers[self.layer_idx].values[:bsz] ``` However, the problem is that we can't...

@i3hz yeah because the issue is that, at some point it returns `self.keys` and `self.value` , not just for Whisper, but also for other models. The `self.keys_ / self.values_` trick...

> Probably a bit more informative name would be better, it's easy to lose track when reading the code Yeah probably `self.keys_, self.values_` ->`self.keys_,full self.values_full` or something like that >...

Thanks @i3hz , I will test compile with your latest changes on Monday

@i3hz your version gives me `Segmentation fault (core dumped)`. You also need to assign `self.keys` and `self.values` as the truncated cache not the full cache, I don't think it's possible...

@i3hz it doesn't matter which stage is compiled actually, I was getting seg fault even without compile 🤔 > BUT the output is incorrect For Whisper only, trying to see...

@i3hz yes it works 👍 The incorrect output is not related to the static cache code update we are doing here, the output is incorrect with Whisper + static even...

> The incorrect output is not related to the static cache code update we are doing here, the output is incorrect with Whisper + static even without the code update...