Lukas Kreussel

Results 114 comments of Lukas Kreussel

> TEI does not batch on CPU (yet). Well that certainly explains why it's a lot slower for batched inputs. I guess thats something that could easily be added later....

> PyTorch is likely to use mkl behind the hood, you can check whether your version is compiled with mkl support by running `torch.backends.mkl.is_available()` Pytorch seams to use `mkl`, i...

Glad you enjoyed playing around with `llm-rs-python` a bit. I already thought about adding it to the haystack-integrations and posting a short message + example in your discords `#show-and-tell` channel...

Alright, i'll try to post about this via discord when i get back home from work and i'll probably add a little disclaimer hinting that there will be breaking changes...

@TuanaCelik I haven't forgotten about this, and i'm still planning on adding this after GGUF is finalized. But we still need to integrate full GGUF support into rustformers. And since...

Until the changes from https://github.com/ggerganov/llama.cpp/issues/1455 get merged into ggml we probably can't do anything here. Regarding `exllama` that's something to consider after we implemented https://github.com/rustformers/llm/issues/31

OK im currently able to build with CuBlas enabled. Could you provide infos about your OS, Cuda Version and graphics card?

The error seams to be caused by this line: `CUBLAS_CHECK(cublasSetMathMode(g_cublas_handles[id], CUBLAS_TF32_TENSOR_OP_MATH));` with error `error 7` indicating that `CUBLAS_TF32_TENSOR_OP_MATH` is not supported for your CUDA and GPU combination. I don't know...

I tested it again on Ubuntu 22.04 via WSL with an RTX 3090 and CUDA 12.1 and it works as expected. I think its a problem with the CUDA version...

Could you try building with the newest main branch again? There were some changes in the underlying `ggml` implementation.