左冯翊

Results 2 comments of 左冯翊

> The main sources of high "time-to-first-token" latency in RAGFlow are typically the retrieval and query refinement stages, especially when using large embedding models like Qwen3-Embedding-8B and agent workflows. These...

> GPU has nothing to do with RAGFlow. You could deploy embedding inference service on GPU which accelerates indexing and searching procedure.GPU 和 RAGFlow 没关系。你可以在 GPU 上部署嵌入推理服务,加快索引和搜索过程。 @KevinHuSh I’ve just...