ragflow icon indicating copy to clipboard operation
ragflow copied to clipboard

[Question]: Parsing time is so long

Open ChenTao98 opened this issue 10 months ago • 5 comments

Describe your problem

Thanks for your work. I have deploy the ragflow system in my own server.

However, when I upload pdf file (2 pages), it costs long time to parse it (more than 300 seconds ).

log for file 1

流程开始于:
Tue, 16 Apr 2024 13:52:45 GMT
过程持续时间:
385.359
进度消息:
Page(1~2): OCR is running...
Page(1~2): OCR finished
Page(1~2): Layout analysis finished.
Page(1~2): Table analysis finished.
Page(1~2): Text merging finished
Page(1~2): Finished slicing files(3). Start to embedding the content.
Page(1~2): Finished embedding! Start to build index!
Page(1~2): Done!

log for file 2

流程开始于:
Tue, 16 Apr 2024 14:08:13 GMT
过程持续时间:
771.436
进度消息:
Page(1~2): OCR is running...
Page(1~2): OCR finished
Page(1~2): Layout analysis finished.
Page(1~2): Table analysis finished.
Page(1~2): Text merging finished
Page(1~2): Finished slicing files(3). Start to embedding the content.
Page(1~2): Finished embedding! Start to build index!
Page(1~2): Done!

ChenTao98 avatar Apr 16 '24 06:04 ChenTao98

You can try calling GPU resources for parsing. According to the process of Docker deployment, GPU resources are not called by default. Here, by checking "docker/docker-compose. yml" and "docker/docker-compose-cn. yml", it can be seen that there is no configuration related to GPU during Docker container creation.

You just need to stop and delete the relevant containers that have already been started, add the following configuration in these two folders, and re execute the Docker Compose. When parsing again, you will find that the speed will be much faster after calling the GPU.

deploy: 
   resources:
      reservations:
         devices:
           - driver: nvidia
           device_ids: ['0']
           capabilities: [gpu]

Such as docker-compose. yml, complete as follows:

version: '2.2'
include:
  - path: ./docker-compose-base.yml
    env_file: ./.env
services:
  ragflow:
    depends_on:
      mysql:
        condition: service_healthy
      es01:
        condition: service_healthy
    image: infiniflow/ragflow:v1.0
    container_name: ragflow-server
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            device_ids: ['0']
            capabilities: [gpu]
    ports:
      - ${SVR_HTTP_PORT}:9380
      - 80:80
      - 443:443
    volumes:
      - ./service_conf.yaml:/ragflow/conf/service_conf.yaml
      - ./entrypoint.sh:/ragflow/entrypoint.sh
      - ./ragflow-logs:/ragflow/logs
      - ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
      - ./nginx/proxy.conf:/etc/nginx/proxy.conf
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
    environment:
      - TZ=${TIMEZONE}
    networks:
      - ragflow
    restart: always

wzikang avatar Apr 16 '24 09:04 wzikang

yes it tooks long time. worth it.

ysyx2008 avatar Apr 16 '24 14:04 ysyx2008

You can try calling GPU resources for parsing. According to the process of Docker deployment, GPU resources are not called by default. Here, by checking "docker/docker-compose. yml" and "docker/docker-compose-cn. yml", it can be seen that there is no configuration related to GPU during Docker container creation.

You just need to stop and delete the relevant containers that have already been started, add the following configuration in these two folders, and re execute the Docker Compose. When parsing again, you will find that the speed will be much faster after calling the GPU.

deploy: 
   resources:
      reservations:
         devices:
           - driver: nvidia
           device_ids: ['0']
           capabilities: [gpu]

Such as docker-compose. yml, complete as follows:

version: '2.2'
include:
  - path: ./docker-compose-base.yml
    env_file: ./.env
services:
  ragflow:
    depends_on:
      mysql:
        condition: service_healthy
      es01:
        condition: service_healthy
    image: infiniflow/ragflow:v1.0
    container_name: ragflow-server
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            device_ids: ['0']
            capabilities: [gpu]
    ports:
      - ${SVR_HTTP_PORT}:9380
      - 80:80
      - 443:443
    volumes:
      - ./service_conf.yaml:/ragflow/conf/service_conf.yaml
      - ./entrypoint.sh:/ragflow/entrypoint.sh
      - ./ragflow-logs:/ragflow/logs
      - ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
      - ./nginx/proxy.conf:/etc/nginx/proxy.conf
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
    environment:
      - TZ=${TIMEZONE}
    networks:
      - ragflow
    restart: always

I have tried to add the configuration of gpu, however it doesn't work. Is a specific version of cuda or nvidia-driver required. The cuda version I used NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5

ChenTao98 avatar Apr 17 '24 01:04 ChenTao98

You can try calling GPU resources for parsing. According to the process of Docker deployment, GPU resources are not called by default. Here, by checking "docker/docker-compose. yml" and "docker/docker-compose-cn. yml", it can be seen that there is no configuration related to GPU during Docker container creation. You just need to stop and delete the relevant containers that have already been started, add the following configuration in these two folders, and re execute the Docker Compose. When parsing again, you will find that the speed will be much faster after calling the GPU.

deploy: 
   resources:
      reservations:
         devices:
           - driver: nvidia
           device_ids: ['0']
           capabilities: [gpu]

Such as docker-compose. yml, complete as follows:

version: '2.2'
include:
  - path: ./docker-compose-base.yml
    env_file: ./.env
services:
  ragflow:
    depends_on:
      mysql:
        condition: service_healthy
      es01:
        condition: service_healthy
    image: infiniflow/ragflow:v1.0
    container_name: ragflow-server
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            device_ids: ['0']
            capabilities: [gpu]
    ports:
      - ${SVR_HTTP_PORT}:9380
      - 80:80
      - 443:443
    volumes:
      - ./service_conf.yaml:/ragflow/conf/service_conf.yaml
      - ./entrypoint.sh:/ragflow/entrypoint.sh
      - ./ragflow-logs:/ragflow/logs
      - ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
      - ./nginx/proxy.conf:/etc/nginx/proxy.conf
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
    environment:
      - TZ=${TIMEZONE}
    networks:
      - ragflow
    restart: always

I have tried to add the configuration of gpu, however it doesn't work. Is a specific version of cuda or nvidia-driver required. The cuda version I used NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5

This is the version I am using, verifying that GPU parsing can be called normally. Docker Compose :v2.21.0 Nvidia Driver :524.147.05 CUDA Version :12.0 You can check the Docker Compose version, as the GPU mounting configuration for different versions of Docker Compose may vary.

wzikang avatar Apr 17 '24 01:04 wzikang

working perfectly on driver 525.125.06 - cuda 12.0 ragflow v2.0 6.0 - build testing on 2024-05-22 16:30 GMT -3 AMERICA_SAO_PAULO_BR

vbmcpy avatar May 22 '24 19:05 vbmcpy

You can try calling GPU resources for parsing. According to the process of Docker deployment, GPU resources are not called by default. Here, by checking "docker/docker-compose. yml" and "docker/docker-compose-cn. yml", it can be seen that there is no configuration related to GPU during Docker container creation. You just need to stop and delete the relevant containers that have already been started, add the following configuration in these two folders, and re execute the Docker Compose. When parsing again, you will find that the speed will be much faster after calling the GPU.

deploy: 
   resources:
      reservations:
         devices:
           - driver: nvidia
           device_ids: ['0']
           capabilities: [gpu]

Such as docker-compose. yml, complete as follows:

version: '2.2'
include:
  - path: ./docker-compose-base.yml
    env_file: ./.env
services:
  ragflow:
    depends_on:
      mysql:
        condition: service_healthy
      es01:
        condition: service_healthy
    image: infiniflow/ragflow:v1.0
    container_name: ragflow-server
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            device_ids: ['0']
            capabilities: [gpu]
    ports:
      - ${SVR_HTTP_PORT}:9380
      - 80:80
      - 443:443
    volumes:
      - ./service_conf.yaml:/ragflow/conf/service_conf.yaml
      - ./entrypoint.sh:/ragflow/entrypoint.sh
      - ./ragflow-logs:/ragflow/logs
      - ./nginx/ragflow.conf:/etc/nginx/conf.d/ragflow.conf
      - ./nginx/proxy.conf:/etc/nginx/proxy.conf
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
    environment:
      - TZ=${TIMEZONE}
    networks:
      - ragflow
    restart: always

I have tried to add the configuration of gpu, however it doesn't work. Is a specific version of cuda or nvidia-driver required. The cuda version I used NVIDIA-SMI 495.29.05 Driver Version: 495.29.05 CUDA Version: 11.5

You need CUDA 12, and minimum hardware for CUDA 12 is the generation of GTX 980 I believe.

alex-ca1123 avatar May 26 '24 18:05 alex-ca1123