Lubomir Litchev
Lubomir Litchev
### Request description This is a RFC proposal for speeding up the current TopK implementation. Problem that this proposal addresses In one LLM model we see the TopK function taking...
This is a hybrid (vector and scalars) implementation of TopK. The load and compare of elements is done using vectors and the sorting/writing to result is done using scalar registers....
### What happened? Running the following tests, results in duplicated elements in the output. ```func.func @vector_call_topk_1x256() { %input_values = util.unfoldable_constant dense : tensor %out_values_empty = tensor.empty() : tensor %out_indices_empty =...