serve icon indicating copy to clipboard operation
serve copied to clipboard

how to register a workflow directly when docker is started.

Open jack-gits opened this issue 3 years ago • 8 comments

🚀 The feature

how to register a workflow directly when docker is started.

Motivation, pitch

how to register a workflow directly when docker is started.

Alternatives

No response

Additional context

No response

jack-gits avatar Sep 02 '22 14:09 jack-gits

EDIT: This doesn't work.

Adding a curl request after torchserve start in dockerd-entrypoint.sh may be a solution but not sure if it will work. I will update here after I tried. What I mean is like:

#!/bin/bash
set -e

if [[ "$1" = "serve" ]]; then
    shift 1
    torchserve --start --ts-config /home/model-server/config.properties
    curl -X POST "http://0.0.0.0:8081/workflows?url=example_workflow.war"
else
    eval "$@"
fi

# prevent docker exit
tail -f /dev/null

samils7 avatar Oct 21 '22 11:10 samils7

Does anyone have a solution for this issue?

samils7 avatar Nov 22 '22 14:11 samils7

Problem caused by the command

torchserve --start --ts-config /home/model-server/config.properties

doesn't finish. It starts torchserve but stucks on this line and so can't send the curl request since it is not executed.

I tried starting torchserve in background by adding nohup to this command as:

nohup torchserve --start --ts-config /home/model-server/config.properties > log.txt & sleep 10 curl -X POST "http://0.0.0.0:8081/workflows?url=example_workflow.war"

sleeping 10 seconds is a must here since torchserve start takes some time.

This solution is not good though.

samils7 avatar Nov 23 '22 08:11 samils7

@samils7 Do you any solution to solve this issue ?

RTae avatar Jun 29 '23 03:06 RTae

In dockerfile,

nohup torchserve --start --ts-config /home/model-server/config.properties > log.txt & sleep 10 curl -X POST "http://0.0.0.0:8081/workflows?url=example_workflow.war"

This may be work for you.

samils7 avatar Jun 29 '23 19:06 samils7

@samils7 It is not working. i deployed the image with your suggestion Entrypoint in the GCP Cloud Run, but the curl is frozen.

Screen Shot 2566-11-14 at 18 44 35

RTae avatar Nov 14 '23 11:11 RTae

@samils7

I tried this and it works for me. This is the only change in my code

diff --git a/docker/dockerd-entrypoint.sh b/docker/dockerd-entrypoint.sh
index 41ba00b0..86da7c4c 100755
--- a/docker/dockerd-entrypoint.sh
+++ b/docker/dockerd-entrypoint.sh
@@ -3,7 +3,9 @@ set -e
 
 if [[ "$1" = "serve" ]]; then
     shift 1
-    torchserve --start --ts-config /home/model-server/config.properties
+    torchserve --start --ts-config /home/model-server/config.properties 
+    sleep 10
+    curl -X POST "http://0.0.0.0:8081/workflows?url=dog_breed_wf.war"

docker run --rm -it -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 -v $(pwd)/model_store:/home/model-server/model-store -v $(pwd)/wf_store:/home/model-server/wf-store pytorch/torchserve:latest
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2023-11-14T18:49:15,368 [WARN ] main org.pytorch.serve.util.ConfigManager - Your torchserve instance can access any URL to load models. When deploying to production, make sure to limit the set of allowed_urls in config.properties
2023-11-14T18:49:15,370 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2023-11-14T18:49:15,424 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2023-11-14T18:49:15,515 [INFO ] main org.pytorch.serve.ModelServer - 
Torchserve version: 0.9.0
TS Home: /home/venv/lib/python3.9/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 0
Number of CPUs: 8
Max heap size: 7936 M
Python executable: /home/venv/bin/python
Config file: /home/model-server/config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://0.0.0.0:8082
Model Store: /home/model-server/model-store
Initial Models: N/A
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 32
Netty client threads: 0
Default workers per model: 8
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: log
Disable system metrics: false
Workflow Store: /home/model-server/model-store
Model config: N/A
2023-11-14T18:49:15,521 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2023-11-14T18:49:15,538 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2023-11-14T18:49:15,580 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2023-11-14T18:49:15,580 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2023-11-14T18:49:15,581 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081
2023-11-14T18:49:15,582 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2023-11-14T18:49:15,582 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
Model server started.
2023-11-14T18:49:15,799 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:f54202902624,timestamp:1699987755
2023-11-14T18:49:15,800 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:206.8248176574707|#Level:Host|#hostname:f54202902624,timestamp:1699987755
2023-11-14T18:49:15,800 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:83.72612762451172|#Level:Host|#hostname:f54202902624,timestamp:1699987755
2023-11-14T18:49:15,801 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:28.8|#Level:Host|#hostname:f54202902624,timestamp:1699987755
2023-11-14T18:49:15,801 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:30589.96875|#Level:Host|#hostname:f54202902624,timestamp:1699987755
2023-11-14T18:49:15,801 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:675.4453125|#Level:Host|#hostname:f54202902624,timestamp:1699987755
2023-11-14T18:49:15,802 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:3.6|#Level:Host|#hostname:f54202902624,timestamp:1699987755
2023-11-14T18:49:24,988 [DEBUG] pool-6-thread-1 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model dog_breed_wf__pre_processing
2023-11-14T18:49:24,989 [DEBUG] pool-6-thread-1 org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model dog_breed_wf__pre_processing
2023-11-14T18:49:24,989 [INFO ] pool-6-thread-1 org.pytorch.serve.wlm.ModelManager - Model dog_breed_wf__pre_processing loaded.
2023-11-14T18:49:24,990 [DEBUG] pool-6-thread-1 org.pytorch.serve.wlm.ModelManager - updateModel: dog_breed_wf__pre_processing, count: 1
2023-11-14T18:49:24,995 [DEBUG] W-9000-dog_breed_wf__pre_processing_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-11-14T18:49:25,734 [DEBUG] pool-6-thread-3 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model dog_breed_wf__cat_dog_classification
2023-11-14T18:49:25,735 [DEBUG] pool-6-thread-3 org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model dog_breed_wf__cat_dog_classification
2023-11-14T18:49:25,735 [INFO ] pool-6-thread-3 org.pytorch.serve.wlm.ModelManager - Model dog_breed_wf__cat_dog_classification loaded.
2023-11-14T18:49:25,735 [DEBUG] pool-6-thread-3 org.pytorch.serve.wlm.ModelManager - updateModel: dog_breed_wf__cat_dog_classification, count: 1
2023-11-14T18:49:25,736 [DEBUG] W-9001-dog_breed_wf__cat_dog_classification_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9001, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-11-14T18:49:26,307 [INFO ] W-9000-dog_breed_wf__pre_processing_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9000, pid=53
2023-11-14T18:49:26,309 [INFO ] W-9000-dog_breed_wf__pre_processing_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2023-11-14T18:49:26,359 [INFO ] W-9000-dog_breed_wf__pre_processing_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-11-14T18:49:26,359 [INFO ] W-9000-dog_breed_wf__pre_processing_1.0-stdout MODEL_LOG - [PID]53
2023-11-14T18:49:26,359 [INFO ] W-9000-dog_breed_wf__pre_processing_1.0-stdout MODEL_LOG - Torch worker started.
2023-11-14T18:49:26,360 [INFO ] W-9000-dog_breed_wf__pre_processing_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2023-11-14T18:49:26,360 [DEBUG] W-9000-dog_breed_wf__pre_processing_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-dog_breed_wf__pre_processing_1.0 State change null -> WORKER_STARTED
2023-11-14T18:49:26,363 [INFO ] W-9000-dog_breed_wf__pre_processing_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2023-11-14T18:49:26,367 [INFO ] W-9000-dog_breed_wf__pre_processing_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2023-11-14T18:49:26,370 [INFO ] W-9000-dog_breed_wf__pre_processing_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1699987766370
2023-11-14T18:49:26,393 [INFO ] W-9000-dog_breed_wf__pre_processing_1.0-stdout MODEL_LOG - model_name: dog_breed_wf__pre_processing, batchSize: 1
2023-11-14T18:49:26,395 [DEBUG] W-9000-dog_breed_wf__pre_processing_1.0 org.pytorch.serve.wlm.WorkerThread - sent a reply, jobdone: true
2023-11-14T18:49:26,395 [INFO ] W-9000-dog_breed_wf__pre_processing_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1
2023-11-14T18:49:26,395 [DEBUG] W-9000-dog_breed_wf__pre_processing_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-dog_breed_wf__pre_processing_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2023-11-14T18:49:26,395 [INFO ] W-9000-dog_breed_wf__pre_processing_1.0 TS_METRICS - WorkerLoadTime.Milliseconds:1402.0|#WorkerName:W-9000-dog_breed_wf__pre_processing_1.0,Level:Host|#hostname:f54202902624,timestamp:1699987766
2023-11-14T18:49:26,396 [INFO ] W-9000-dog_breed_wf__pre_processing_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:25.0|#Level:Host|#hostname:f54202902624,timestamp:1699987766
2023-11-14T18:49:26,534 [DEBUG] pool-6-thread-2 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model dog_breed_wf__dog_breed_classification
2023-11-14T18:49:26,535 [DEBUG] pool-6-thread-2 org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model dog_breed_wf__dog_breed_classification
2023-11-14T18:49:26,535 [INFO ] pool-6-thread-2 org.pytorch.serve.wlm.ModelManager - Model dog_breed_wf__dog_breed_classification loaded.
2023-11-14T18:49:26,535 [DEBUG] pool-6-thread-2 org.pytorch.serve.wlm.ModelManager - updateModel: dog_breed_wf__dog_breed_classification, count: 1
2023-11-14T18:49:26,536 [DEBUG] W-9002-dog_breed_wf__dog_breed_classification_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9002, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2023-11-14T18:49:27,059 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9001, pid=66
2023-11-14T18:49:27,060 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9001
2023-11-14T18:49:27,111 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-11-14T18:49:27,111 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0-stdout MODEL_LOG - [PID]66
2023-11-14T18:49:27,112 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0-stdout MODEL_LOG - Torch worker started.
2023-11-14T18:49:27,112 [DEBUG] W-9001-dog_breed_wf__cat_dog_classification_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-dog_breed_wf__cat_dog_classification_1.0 State change null -> WORKER_STARTED
2023-11-14T18:49:27,112 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2023-11-14T18:49:27,112 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9001
2023-11-14T18:49:27,113 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9001.
2023-11-14T18:49:27,113 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1699987767113
2023-11-14T18:49:27,130 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0-stdout MODEL_LOG - model_name: dog_breed_wf__cat_dog_classification, batchSize: 4
2023-11-14T18:49:27,793 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0-stdout MODEL_LOG - generated new fontManager
2023-11-14T18:49:27,822 [INFO ] W-9002-dog_breed_wf__dog_breed_classification_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9002, pid=80
2023-11-14T18:49:27,824 [INFO ] W-9002-dog_breed_wf__dog_breed_classification_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9002
2023-11-14T18:49:27,874 [INFO ] W-9002-dog_breed_wf__dog_breed_classification_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2023-11-14T18:49:27,874 [INFO ] W-9002-dog_breed_wf__dog_breed_classification_1.0-stdout MODEL_LOG - [PID]80
2023-11-14T18:49:27,874 [INFO ] W-9002-dog_breed_wf__dog_breed_classification_1.0-stdout MODEL_LOG - Torch worker started.
2023-11-14T18:49:27,874 [DEBUG] W-9002-dog_breed_wf__dog_breed_classification_1.0 org.pytorch.serve.wlm.WorkerThread - W-9002-dog_breed_wf__dog_breed_classification_1.0 State change null -> WORKER_STARTED
2023-11-14T18:49:27,875 [INFO ] W-9002-dog_breed_wf__dog_breed_classification_1.0-stdout MODEL_LOG - Python runtime: 3.9.18
2023-11-14T18:49:27,875 [INFO ] W-9002-dog_breed_wf__dog_breed_classification_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9002
2023-11-14T18:49:27,876 [INFO ] W-9002-dog_breed_wf__dog_breed_classification_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9002.
2023-11-14T18:49:27,876 [INFO ] W-9002-dog_breed_wf__dog_breed_classification_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD to backend at: 1699987767876
2023-11-14T18:49:27,897 [INFO ] W-9002-dog_breed_wf__dog_breed_classification_1.0-stdout MODEL_LOG - model_name: dog_breed_wf__dog_breed_classification, batchSize: 4
2023-11-14T18:49:28,023 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-11-14T18:49:28,024 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2023-11-14T18:49:28,219 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0-stdout MODEL_LOG - '/home/model-server/tmp/models/c005a8b533594d26818e402eda451005/index_to_name.json' is missing. Inference output will not include class name.
2023-11-14T18:49:28,220 [DEBUG] W-9001-dog_breed_wf__cat_dog_classification_1.0 org.pytorch.serve.wlm.WorkerThread - sent a reply, jobdone: true
2023-11-14T18:49:28,220 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1090
2023-11-14T18:49:28,220 [DEBUG] W-9001-dog_breed_wf__cat_dog_classification_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-dog_breed_wf__cat_dog_classification_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2023-11-14T18:49:28,220 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0 TS_METRICS - WorkerLoadTime.Milliseconds:2484.0|#WorkerName:W-9001-dog_breed_wf__cat_dog_classification_1.0,Level:Host|#hostname:f54202902624,timestamp:1699987768
2023-11-14T18:49:28,221 [INFO ] W-9001-dog_breed_wf__cat_dog_classification_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:18.0|#Level:Host|#hostname:f54202902624,timestamp:1699987768
2023-11-14T18:49:28,586 [INFO ] W-9002-dog_breed_wf__dog_breed_classification_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2023-11-14T18:49:28,586 [INFO ] W-9002-dog_breed_wf__dog_breed_classification_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2023-11-14T18:49:28,968 [DEBUG] W-9002-dog_breed_wf__dog_breed_classification_1.0 org.pytorch.serve.wlm.WorkerThread - sent a reply, jobdone: true
2023-11-14T18:49:28,968 [INFO ] W-9002-dog_breed_wf__dog_breed_classification_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1071
2023-11-14T18:49:28,968 [DEBUG] W-9002-dog_breed_wf__dog_breed_classification_1.0 org.pytorch.serve.wlm.WorkerThread - W-9002-dog_breed_wf__dog_breed_classification_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2023-11-14T18:49:28,968 [INFO ] W-9002-dog_breed_wf__dog_breed_classification_1.0 TS_METRICS - WorkerLoadTime.Milliseconds:2433.0|#WorkerName:W-9002-dog_breed_wf__dog_breed_classification_1.0,Level:Host|#hostname:f54202902624,timestamp:1699987768
2023-11-14T18:49:28,969 [INFO ] W-9002-dog_breed_wf__dog_breed_classification_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:22.0|#Level:Host|#hostname:f54202902624,timestamp:1699987768
2023-11-14T18:49:28,971 [INFO ] epollEventLoopGroup-3-1 ACCESS_LOG - /127.0.0.1:51066 "POST /workflows?url=dog_breed_wf.war HTTP/1.1" 200 4001
2023-11-14T18:49:28,972 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:f54202902624,timestamp:1699987768
{
  "status": "Workflow dog_breed_wf has been registered and scaled successfully."
}


agunapal avatar Nov 14 '23 18:11 agunapal

@agunapal Have you ever tried with Cloud run, my curl is stuck at request

RTae avatar Nov 15 '23 06:11 RTae