Ivan Komarov comments

Results 49 comments of


                                            Ivan Komarov

cuBLAS: use host pinned memory and dequantize while copying

I think these changes looks great. You said [elsewhere](https://github.com/ggerganov/llama.cpp/issues/1129#issuecomment-1526057911) that this stuff might cause "some friction", but I think it turns out to be very non-intrusive. The CUDA stuff is...

[DRAFT] Speedup dequantize kernels

@slaren Yeah, I totally agree with your overall message -- adding new quantization methods (which, judging by the issues/discussions, will keep appearing) by porting reference implementation should be easy, and...

[DRAFT] Speedup dequantize kernels

@SlyEcho > On a sidenote, there could also be a way to unify the CL and CUDA/ROCm kernels using some "clever" techniques Oh wow, I managed to totally miss the...

[DRAFT] Speedup dequantize kernels

> When I re-ran the tests with the kernel in this PR prompt processing was ~57% faster compared to master. Whoa, this is nice to hear. Unfortunately I don't have...

[DRAFT] Speedup dequantize kernels

> But more importantly, I think that my kernel is simpler Yup, I think this your version is pretty much what I started with. The additional complexity in my version...

[DRAFT] Speedup dequantize kernels

Closing this, since this PR is outdated and largely superseded by @JohannesGaessler's [efforts](https://github.com/ggerganov/llama.cpp/pull/1341#issuecomment-1546690422).

Restarting language server gives ENOENT error

I also stumbled upon this weird error. It turns out that [child_process.spawn()](https://nodejs.org/api/child_process.html#child_processspawncommand-args-options) returns `ENOENT` if the current working directory of the process doesn't exist, even if the process binary itself...

Ivan Komarov

cuBLAS: use host pinned memory and dequantize while copying

[DRAFT] Speedup dequantize kernels

[DRAFT] Speedup dequantize kernels

[DRAFT] Speedup dequantize kernels

[DRAFT] Speedup dequantize kernels

[DRAFT] Speedup dequantize kernels

Restarting language server gives ENOENT error

No release for v2.2.0

No release for v2.2.0

No release for v2.2.0