Kevin Hu

Results 1524 comments of Kevin Hu

Do not utilize GPUs for RAGFlow server. You could deploy an embedding inference server on GPUs which will accelerate chunking procedure much more.

I recommand to apply `slim` version of docker image and not to deploy RAGFlow with GPU which is more feasible for embedding/LLM inference.

set this in docker/.env RAGFLOW_IMAGE=registry.cn-hangzhou.aliyuncs.com/infiniflow/ragflow:dev

This is for the case we can't find max token length for assigned LLM.

What kind of LLM did you use, let me check if there's a bug or something?

RAGFlow does not know the context length of models added through XInference, which needs to be improved.

It‘s definitely controled by context length of LLM.

Developing with docker image/container is more promising and faster way.

Check out [this](https://github.com/infiniflow/ragflow/blob/main/sdk/python/test/t_document.py#L292)

The context is out of length. Adjust these 2 parameters. Or, cut down chunk token number. ![image](https://github.com/user-attachments/assets/e5d6b51c-653e-40cb-82d5-5f4c8f9ee909)