dify icon indicating copy to clipboard operation
dify copied to clipboard

The service for text indexing is too laggy

Open Lxx-c opened this issue 1 year ago • 3 comments

Self Checks

  • [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [X] I have searched for existing issues search for existing issues, including closed ones.
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.6.16

Cloud or Self Hosted

Self Hosted (Source)

Steps to reproduce

As shown in the question, the service is stuck when indexing documents. It is not clear what the reason is, especially when the uploaded documents are large. However, according to the documentation, the main service and worker services are separate, and documents should not affect the main service when indexing. I deployed it using the source code, and the main service mentioned above mainly refers to the studio session.

✔️ Expected Behavior

No response

❌ Actual Behavior

No response

Lxx-c avatar Aug 08 '24 01:08 Lxx-c

To address the significant lag with the text indexing service in Dify version 0.6.16, you should ensure that the worker service responsible for handling asynchronous tasks is properly configured and has sufficient resources. Here are some steps you can take to identify and resolve the issue:

  1. Check Resource Allocation: Ensure that your Docker virtual machine (VM) has enough resources allocated. The documentation recommends a minimum of 2 virtual CPUs (vCPUs) and 8 GB of initial memory. You can adjust these settings in Docker Desktop.

  2. Verify Worker Service Configuration: Make sure the worker service is running correctly and is configured to handle the necessary queues. The command to start the worker service on Linux or macOS is:

    celery -A app.celery worker -P gevent -c 1 -Q dataset,generation,mail --loglevel INFO
    

    For Windows, use:

    celery -A app.celery worker -P solo --without-gossip --without-mingle -Q dataset,generation,mail --loglevel INFO
    
  3. Monitor Worker Logs: Check the logs of the worker service for any errors or warnings that might indicate issues with task processing. The logs can provide insights into what might be causing the lag.

  4. Database Performance: Ensure that your PostgreSQL and Redis instances are performing optimally. Slow database queries or insufficient resources allocated to these services can cause delays in task processing.

  5. Increase Concurrency: If the worker service is under heavy load, you might need to increase the concurrency level. For example, you can increase the number of worker processes by adjusting the -c parameter:

    celery -A app.celery worker -P gevent -c 4 -Q dataset,generation,mail --loglevel INFO
    

    This command increases the number of concurrent worker processes to 4.

  6. Ensure Proper Installation: Follow the detailed installation steps provided in the documentation to ensure all dependencies and services are correctly set up. This includes setting up PostgreSQL, Redis, and Weaviate if not already installed locally. Use the following command to start these services:

    cd docker
    docker compose -f docker-compose.middleware.yaml up -d
    
  7. Environment Configuration: Make sure your environment variables are correctly set up. Copy the example environment file and generate a random secret key:

    cp .env.example .env
    openssl rand -base64 42
    sed -i 's/SECRET_KEY=.*/SECRET_KEY=<your_value>/' .env
    
  8. Python Environment Setup: Ensure you have Python 3.10.x installed. Using Anaconda can simplify this process:

    conda create --name dify python=3.10
    conda activate dify
    
  9. Install Dependencies and Migrate Database: Navigate to the api directory, install dependencies, and run database migrations:

    cd api
    pip install -r requirements.txt
    flask db upgrade
    
  10. Start API Service: Start the API service to ensure it is running correctly:

    flask run --host 0.0.0.0 --port=5001 --debug
    

By following these steps, you should be able to identify and resolve the lag issue with the text indexing service in your self-hosted Dify instance [1][2][3].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Aug 08 '24 01:08 dosubot[bot]

The same problem seems to be very common. Even if I use different celeries and set the concurrency of each one to 128, when uploading particularly large files, it still blocks all services, not just the dataset service @JohnJyong

401557122 avatar Aug 09 '24 08:08 401557122

@401557122 Hi, have you solved this problem? @JohnJyong @crazywoola May I ask if you have located the cause? This problem has a serious impact on the use process, and it has been raised for 2 weeks.

Lxx-c avatar Aug 19 '24 01:08 Lxx-c

No logs and error information are provided, making it impossible to troubleshoot. you can reopen it with more detail

JohnJyong avatar Nov 08 '24 09:11 JohnJyong