左冯翊
Results
2
comments of
左冯翊
> The main sources of high "time-to-first-token" latency in RAGFlow are typically the retrieval and query refinement stages, especially when using large embedding models like Qwen3-Embedding-8B and agent workflows. These...
> GPU has nothing to do with RAGFlow. You could deploy embedding inference service on GPU which accelerates indexing and searching procedure.GPU 和 RAGFlow 没关系。你可以在 GPU 上部署嵌入推理服务,加快索引和搜索过程。 @KevinHuSh I’ve just...