dify icon indicating copy to clipboard operation
dify copied to clipboard

the api docker service took a long time to start

Open Gaocarri opened this issue 1 year ago • 7 comments

Self Checks

  • [X] This is only for bug report, if you would like to ask a question, please head to Discussions.
  • [X] I have searched for existing issues search for existing issues, including closed ones.
  • [X] I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [X] [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • [X] Please do not modify this template :) and fill in all the required fields.

Dify version

0.10.0、0.10.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

When I updated from 0.9.x to 0.10.0, after docker compose up, the api service was unavailable 502, and it took a long time before it could be used(next day i found it is available). Then when I upgraded from 0.10.0 to 0.10.1, the same thing happened and the api service is still unavailable now, here is the docker log

docker logs dfe44f71c60b

Running migrations
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
/app/api/.venv/lib/python3.10/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)

How to solve this problem

✔️ Expected Behavior

docker service starts normally

❌ Actual Behavior

docker service starts normally

Gaocarri avatar Oct 24 '24 02:10 Gaocarri

I found some similar issues and discussions that might be relevant to your problem:

  1. Local Docker Setup Failure - This discussion is unsolved and might provide some insights into Docker setup issues [1].

  2. About docker-compose deployment issues - Another unsolved discussion that could be related to your deployment issues [2].

  3. docker version can't start up after update to v1.0 - This closed issue suggests that a "No space left on device" error can cause the API server to shut down. The solution involves cleaning up Docker resources, checking disk usage, and possibly increasing Docker disk space [3].

Regarding your specific issue, the logs indicate that none of PyTorch, TensorFlow, or Flax have been found, which might affect model availability. Additionally, there's a warning about missing ffmpeg, which is required for certain functionalities like OpenAI TTS. You should ensure that these dependencies are installed and properly configured.

To address the migration delay, ensure that the MIGRATION_ENABLED environment variable is set to 'true', as migrations are executed before the application starts. Delays can occur if the migrations are extensive or if there are database connection issues [4].

Make sure to install ffmpeg on your system to resolve the related warning. This can be done using package managers like apt on Ubuntu or brew on macOS.

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 24 '24 03:10 dosubot[bot]

1 hour later, i found it success,what happened

Running migrations
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
/app/api/.venv/lib/python3.10/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
Preparing database migration...
Starting database migration.
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
Database migration successful!
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
/app/api/.venv/lib/python3.10/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
  warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
[2024-10-24 03:01:46 +0000] [1] [INFO] Starting gunicorn 22.0.0
[2024-10-24 03:01:46 +0000] [1] [INFO] Listening at: http://0.0.0.0:5001 (1)
[2024-10-24 03:01:46 +0000] [1] [INFO] Using worker: gevent
[2024-10-24 03:01:46 +0000] [87] [INFO] Booting worker with pid: 87

Gaocarri avatar Oct 24 '24 03:10 Gaocarri

我也经常遇见这种问题,每次更新像渡劫,更新完就进不去了。 真的不敢更新了

wanshichenguang avatar Oct 24 '24 03:10 wanshichenguang

我也经常遇见这种问题,每次更新像渡劫,更新完就进不去了。 真的不敢更新了

Did you have the problem starting from 0.10.x?

Gaocarri avatar Oct 24 '24 03:10 Gaocarri

We have received several reports regarding this from 3 minutes to 1 hour. Not sure what's happening yet.

crazywoola avatar Oct 24 '24 06:10 crazywoola

We have received several reports regarding this from 3 minutes to 1 hour. Not sure what's happening yet.

this problem happened in my server machine, but in my macbook it start success in 2min. I noticed that the warning for pydub on the macbook appears after the database is successfully migrated, but on the server the warning for pydub appears before the database is successfully migrated

macbook :

2024-10-24 11:39:00 Running migrations
2024-10-24 11:39:17 sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
2024-10-24 11:39:17 sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
2024-10-24 11:39:29 Preparing database migration...
2024-10-24 11:39:29 Starting database migration.
2024-10-24 11:39:29 Database migration successful!
2024-10-24 11:39:42 sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
2024-10-24 11:39:42 sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml
2024-10-24 11:39:48  
2024-10-24 11:39:48  -------------- celery@9546df7e67b5 v5.3.6 (emerald-rush)
2024-10-24 11:39:48 --- ***** ----- 
2024-10-24 11:39:48 -- ******* ---- Linux-6.6.31-linuxkit-aarch64-with-glibc2.40 2024-10-24 03:39:48
2024-10-24 11:39:48 - *** --- * --- 
2024-10-24 11:39:48 - ** ---------- [config]
2024-10-24 11:39:48 - ** ---------- .> app:         app_factory:0xffff24d828f0
2024-10-24 11:39:48 - ** ---------- .> transport:   redis://:**@redis:6379/1
2024-10-24 11:39:48 - ** ---------- .> results:     postgresql://postgres:**@db:5432/dify
2024-10-24 11:39:48 - *** --- * --- .> concurrency: 1 (gevent)
2024-10-24 11:39:48 -- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
2024-10-24 11:39:48 --- ***** ----- 
2024-10-24 11:39:48  -------------- [queues]
2024-10-24 11:39:48                 .> app_deletion     exchange=app_deletion(direct) key=app_deletion
2024-10-24 11:39:48                 .> dataset          exchange=dataset(direct) key=dataset
2024-10-24 11:39:48                 .> generation       exchange=generation(direct) key=generation
2024-10-24 11:39:48                 .> mail             exchange=mail(direct) key=mail
2024-10-24 11:39:48                 .> ops_trace        exchange=ops_trace(direct) key=ops_trace
2024-10-24 11:39:48 
2024-10-24 11:39:48 [tasks]
2024-10-24 11:39:48   . schedule.clean_embedding_cache_task.clean_embedding_cache_task
2024-10-24 11:39:48   . schedule.clean_unused_datasets_task.clean_unused_datasets_task
2024-10-24 11:39:48   . tasks.add_document_to_index_task.add_document_to_index_task
2024-10-24 11:39:48   . tasks.annotation.add_annotation_to_index_task.add_annotation_to_index_task
2024-10-24 11:39:48   . tasks.annotation.batch_import_annotations_task.batch_import_annotations_task
2024-10-24 11:39:48   . tasks.annotation.delete_annotation_index_task.delete_annotation_index_task
2024-10-24 11:39:48   . tasks.annotation.disable_annotation_reply_task.disable_annotation_reply_task
2024-10-24 11:39:48   . tasks.annotation.enable_annotation_reply_task.enable_annotation_reply_task
2024-10-24 11:39:48   . tasks.annotation.update_annotation_to_index_task.update_annotation_to_index_task
2024-10-24 11:39:48   . tasks.batch_create_segment_to_index_task.batch_create_segment_to_index_task
2024-10-24 11:39:48   . tasks.clean_dataset_task.clean_dataset_task
2024-10-24 11:39:48   . tasks.clean_document_task.clean_document_task
2024-10-24 11:39:04 None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2024-10-24 11:39:23 /app/api/.venv/lib/python3.10/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
2024-10-24 11:39:23   warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
2024-10-24 11:39:29 INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
2024-10-24 11:39:29 INFO  [alembic.runtime.migration] Will assume transactional DDL.
2024-10-24 11:39:33 None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
2024-10-24 11:39:44 /app/api/.venv/lib/python3.10/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
2024-10-24 11:39:44   warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
2024-10-24 11:39:48 /app/api/.venv/lib/python3.10/site-packages/celery/platforms.py:829: SecurityWarning: You're running the worker with superuser privileges: this is
2024-10-24 11:39:48 absolutely not recommended!

I can't tell if this is related to the issue, I'm just reporting the phenomenon

Gaocarri avatar Oct 24 '24 07:10 Gaocarri

我也经常遇见这种问题,每次更新像渡劫,更新完就进不去了。 真的不敢更新了

Did you have the problem starting from 0.10.x?

I encountered this in previous versions as well.

wanshichenguang avatar Oct 24 '24 07:10 wanshichenguang

Hi, @laipz8200, i tried and same in 0.10.2, could u possibly consider this issue?

Gaocarri avatar Oct 28 '24 12:10 Gaocarri

I am also finding that after updating to 0.10.2 , restarting the container is blocked by one and the restart fails!

Hanfee avatar Oct 30 '24 04:10 Hanfee

I am also finding that after updating to 0.10.2 , restarting the container is blocked by one and the restart fails!

I confirm that my problem occured from 0.9.2 -> 0.10.0, 0.9.2 is ok,did your 0.10.0 and 0.10.1 normal ?

Gaocarri avatar Oct 30 '24 05:10 Gaocarri

I meet same problem. how to resolve it?

cjhgit avatar Oct 30 '24 08:10 cjhgit

I meet same problem. how to resolve it?

@cjhgit I could only wait 1 hour......

Gaocarri avatar Oct 31 '24 01:10 Gaocarri

I meet same problem. how to resolve it?

I solved this problem by using a high-performance computer. It wasted a lot of time

cjhgit avatar Nov 01 '24 11:11 cjhgit

I meet same problem. how to resolve it?

I solved this problem by using a high-performance computer. It wasted a lot of time

what reason?

Gaocarri avatar Nov 04 '24 01:11 Gaocarri

any progress about this issue?

prosscode avatar Nov 11 '24 11:11 prosscode

In fact, I haven't encountered this error myself. I hope you can provide the system you're using, the Docker version, and detailed logs to help us resolve the issue.

laipz8200 avatar Nov 11 '24 11:11 laipz8200

In fact, I haven't encountered this error myself. I hope you can provide the system you're using, the Docker version, and detailed logs to help us resolve the issue.

server is alibaba ECS: image

other infos: image

Docker version: image

The docker log has been provided at the beginning of the issue

Gaocarri avatar Nov 11 '24 11:11 Gaocarri

the previous issue ticket: issue/9874

image

Docker Engine version: 27.3.1 Docker Compose version v2.29.7

prosscode avatar Nov 12 '24 07:11 prosscode

@laipz8200 Any progress? Have you reproduced this problem using other test environments?

Gaocarri avatar Nov 18 '24 01:11 Gaocarri

@Gaocarri We tested in an amd64 environment but still couldn’t pinpoint what happened. Could you try changing the operating system (e.g., to Ubuntu or Debian) and disabling swap to see if there are any changes?

laipz8200 avatar Nov 19 '24 07:11 laipz8200

is normal launch api service when you update to latest version?

prosscode avatar Nov 20 '24 07:11 prosscode

@Gaocarri We tested in an amd64 environment but still couldn’t pinpoint what happened. Could you try changing the operating system (e.g., to Ubuntu or Debian) and disabling swap to see if there are any changes?

It's not practical for me to change the system ): maybe i need to try to start a new service, have you also reproduced this problem?

Gaocarri avatar Nov 20 '24 07:11 Gaocarri

Hi @Gaocarri, we have had a lot of updates recently. Are you still facing this issue in the newest version?

laipz8200 avatar Dec 06 '24 09:12 laipz8200

note I updated to version 0.12.1 and now startup is normal.

prosscode avatar Dec 06 '24 09:12 prosscode

I have same problem on any version on cloud VPS

i have 8gb ram and 4 vCPU

root@cv4338665:~/dify/docker# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04.1 LTS
Release:        24.04
Codename:       noble
root@cv4338665:~/dify/docker# 

bikevit2008 avatar Dec 18 '24 00:12 bikevit2008

Upgraded 0.12.x, 0.13.x ,0.14.x, no such problem found.

Hanfee avatar Dec 19 '24 07:12 Hanfee

after 0.13.x noproblem

Gaocarri avatar Feb 06 '25 12:02 Gaocarri