Nicolas Patry comments

Results 978 comments of


                                            Nicolas Patry

Use the HuggingFace llama Tokenizer

For rmkv I have no idea what tokenizer they use. Do you have a link?

Non-`ggml` backend

> So I don't think it would even be able to get close to the current approach on CPU at least. You'd be suprised :) matmul is still linked against...

Non-`ggml` backend

Hey I've started seeing if the code from ggml couldn't be done in pure Rust, here's the first draft: https://github.com/Narsil/rblas It's x86_64, avx-only right now and I'm getting 2x slower...

Non-`ggml` backend

> the vast, vast majority of the time was spent just in the matrix multiplication. The rest was basically insignificant. The softmax and layer norm can start to take up...

Non-`ggml` backend

> In my linux machine, it gives the same performance (both speed and cpu utilitization) as intel mkl, which is suprising enough that I kind of doubt my result, but...

Interesting numbers (they seem pretty high, are you modifying shapes ?) ``` test bench_faer_rs_n ... bench: 432,096 ns/iter (+/- 73,060) test bench_faer_rs_t ... bench: 721,426 ns/iter (+/- 200,362) test bench_ggblas_n...

Non-`ggml` backend

> I will try it in your smelte-rs project to see how it behaves Thanks. I don't have much time to add new features on it. In general mkl used...

Non-`ggml` backend

> a computation graph abstraction I feel forced to say that this approach has major drawbacks, the biggest of all being that it's hard to implement efficient runtimes. onnxruntime has...

Non-`ggml` backend

> Has anyone done 4 bit quantization on CUDA? Or is this specifically for Cpu optimizations? GPTQ does it : https://github.com/qwopqwop200/GPTQ-for-LLaMa (Triton backed, so you could steal their ptx file...

ERROR: Failed building wheel for tokenizers

We already build wheels for Apple Silicon ! Just not python3.8 which isn't supposed to exist on M1. (only 3.9, 3.10, and 3.11 now)