aciddelgado

Results 6 issues of aciddelgado

I’ve discovered a performance gap between the Neural Speed Matmul operator and the Llama.cpp operator in the Neural-Speed repository. This issue was identified while running a benchmark with the ONNXRuntime-GenAI...

enhancement

As the title says, is this scenario possible right now or is it on the roadmap?

### Description This PR introduces a slight change to the handling of the Local Window Size parameter in the context of Memory Efficient Attention. Previously, setting the Local Window Size...

### Description Found a bug with num splits where the heuristic isn't being performed properly due to incorrect passing of sequence length to heuristic function. ### Motivation and Context We...

### Description This PR will support for Interactive Decoding via the use of a 2-D seqlens_k tensor, which holds the past and total sequence lengths of each sequence in a...

### Description Implement softcap for gqa. ### Motivation and Context Fixes certain models like Gemma-2 which need softcap to work so they don't output nan's.