jon-chuang issues

Results 134 issues of


                                            jon-chuang

accuracy: Q4 matrix multiply error is very bad for small K

In benchmark/benchmark-q4_0-matmult.c: Set sizey=sizez=N,sizex=K ```c++ For K=128,N=2, the deviation is expected 1020.00, got 1280.00 For K=128,N=32, the deviation is expected 262144.00, got 508160.03 For K=64,N=32 the deviation is expected 131072.00,...

testing: allclose operator

allclose tests that all the floats in two tensors of identical size are within an epsilon error tolerance. See also: https://pytorch.org/docs/stable/generated/torch.allclose.html ```c++ bool allclose(ggml_tensor * a, ggml_tensor * b, f32...

Tracking: LoRA

Here are some outstanding issues for LoRA: - [x] Base implementation (https://github.com/ggerganov/llama.cpp/pull/820) - [ ] Improve LoRA application time with SIMD (AVX, AVX2) (https://github.com/ggerganov/llama.cpp/issues/956) - [ ] Improve LoRA loading...

fix(perf/UX): Use num physical cores by default, warn about E/P cores

Fixes: https://github.com/ggerganov/llama.cpp/issues/932 Hyperthreading is bad, probably because we are compute bound (not memory bound). See also: https://github.com/ggerganov/llama.cpp/issues/34 Notes: I consulted GPT4 in the making of this PR.

perf(ggml): tall and skinny GEMM for LoRA: F32 `mul_mat([16 X 5120], [16 X 5120])` takes 120ms - 24x slower than expected

**Context:** [LoRA](https://arxiv.org/abs/2106.09685) requires computing $W' = W + \Delta W$ where $\Delta W = BA^T$ and $A,B$ are tall and skinny. See https://github.com/ggerganov/llama.cpp/pull/820 for the use-case. **Problem** The estimated FLOPs...

perf: Investigate performance discrepancy with llama-rs - 1.5x-2x slower

Preliminary results show that `llama.cpp` is 1.5x-2x _slower_ than `llama-rs`. They were both checked to compile with the same arch flags and use the same gnu toolchain. Summary (on `Vicuna...

discussion: expanding the use-case of llama.cpp - embedded LLM toolchain

For instance, should llama.cpp: 1. support an embedded vector-similarity knowledge base? 2. support other models for multimodality (similar to GPT4). (See e.g. [CLIP-based](https://github.com/facebookresearch/llama/issues/258)) It doesn't have be part of the...

jon-chuang