H-Simpson123

Results 3 comments of H-Simpson123

The FAISS code is called from a client and the openllm server is running on a different machine. The OOM crash is happening on the server side

> afaik 16GB of RAM should be able to load the model. Can you try in int8? This is with int8. Please check my OP. The problem is not initial...

Thanks for the quick update. Any plans from your side to add an optimized implementation to HF transformers?