Zoltan Fedor
Zoltan Fedor
Hah, I just linked from my old issue to your newly created one, as I thought maybe the one you created will get more attention than mine. Hah :-)
@etiennedi , I am getting confused, is the filtering with BM25 is already supported? I was just looking at my Haystack code and saw that just a few days ago...
Okey, then that Haystack PR cannot be correct. That is odd, as it has a unittest which I wrote back in July which catches the error thrown by Weaviate and...
Hi @etiennedi , As suspected, that Haystack PR was wrong, incorrectly assumed that Weaviate now supports filters with BM25 (and also included a bug causing it in reality run an...
The note from the KV-cache implementation on BART states: _"Note: current implementation of K-V cache does not exhibit performance gain over the non K-V cache TensorRT version. Please consider to...
We are very much looking forward to that! Hopefully that also applies to the scenario of the OP - large inputs to T5 models.
> While waiting for this update, we started using NVIDIA's FasterTransformer library instead. It has a highly optimized T5 GPU runtime with KV cache supported and it's 5-10x faster than...
Also Haystack could be used for model serving - as a replacement for OpenAI for those who want to server their own LLM. Haystack pipelines integrate with Ray Serve to...
Yeah, I do use haystack pipelines with nodes acting as clients for NVIDIA Triton for serving the LLMs locally / building langhchain tools for the agent blazing fast.
@notkriswagner, Thanks. I have actually never tested by killing a PHP script. No, I have Apache - PHP 7.2 (mod_php7) and when http calls which execute a Snowflake query get...