Ivan Komarov

Results 5 issues of Ivan Komarov

This PR is a mostly failed attempt to fix [issue #95](https://github.com/karpathy/minGPT/issues/95) from the minGPT repo. The idea is to save the results of key and value projections in each self-attention...

Apologies for the slightly clickbaity title: while technically true, as mentioned in [this comment](https://github.com/ggerganov/llama.cpp/pull/654#issuecomment-1493177071), the current AVX-512 implementation is slower than the regular AVX2 implementation. Compared to the current AVX2...

performance
high priority

**NB**: this is a proof-of-concept PR which only modifies `dequantize_block_q4_0()` (this is the quantization method I'm most familiar with). If there is interest, I will modify all quantization kernels. **NB2**:...

The forward/backward passes of MLP's in mixture-of-expert models are a perfect fit for the grouped GEMM implementation in CUTLASS (for example, the `grouped_gemm` library [uses](https://github.com/tgale96/grouped_gemm/blob/main/csrc/grouped_gemm.cu#L347) CUTLASS for the forward pass)....

I recently found out that [this is a thing](https://en.wikichip.org/wiki/intel/crystal_well) when trying to run a `candle` program (which depends on `gemm`) on this machine: ``` # grep 'model name' /proc/cpuinfo model...