Lukas Kreussel comments

Results 114 comments of


                                            Lukas Kreussel

Slow inference time running on CPU

> TEI does not batch on CPU (yet). Well that certainly explains why it's a lot slower for batched inputs. I guess thats something that could easily be added later....

Slow inference time running on CPU

> PyTorch is likely to use mkl behind the hood, you can check whether your version is compiled with mkl support by running `torch.backends.mkl.is_available()` Pytorch seams to use `mkl`, i...

Add llm-rs-python to haystack-integrations

Glad you enjoyed playing around with `llm-rs-python` a bit. I already thought about adding it to the haystack-integrations and posting a short message + example in your discords `#show-and-tell` channel...

Add llm-rs-python to haystack-integrations

Alright, i'll try to post about this via discord when i get back home from work and i'll probably add a little disclaimer hinting that there will be breaking changes...

Add llm-rs-python to haystack-integrations

@TuanaCelik I haven't forgotten about this, and i'm still planning on adding this after GGUF is finalized. But we still need to integrate full GGUF support into rustformers. And since...

build failure on orin agx

Until the changes from https://github.com/ggerganov/llama.cpp/issues/1455 get merged into ggml we probably can't do anything here. Regarding `exllama` that's something to consider after we implemented https://github.com/rustformers/llm/issues/31

cuBLAS error

OK im currently able to build with CuBlas enabled. Could you provide infos about your OS, Cuda Version and graphics card?

cuBLAS error

The error seams to be caused by this line: `CUBLAS_CHECK(cublasSetMathMode(g_cublas_handles[id], CUBLAS_TF32_TENSOR_OP_MATH));` with error `error 7` indicating that `CUBLAS_TF32_TENSOR_OP_MATH` is not supported for your CUDA and GPU combination. I don't know...

cuBLAS error

I tested it again on Ubuntu 22.04 via WSL with an RTX 3090 and CUDA 12.1 and it works as expected. I think its a problem with the CUDA version...

cuBLAS error

Could you try building with the newest main branch again? There were some changes in the underlying `ggml` implementation.