serve icon indicating copy to clipboard operation
serve copied to clipboard

I think i have started the model successfully,but there is some trouble i can't find

Open jdddp opened this issue 3 years ago • 1 comments

🐛 Describe the bug

i can't curl it, in fact i can find that the port(8080) not in use, may you give me some help

Error logs

2022-09-16T19:38:09,388 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager... 2022-09-16T19:38:09,388 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager... 2022-09-16T19:38:09,535 [INFO ] main org.pytorch.serve.ModelServer - Torchserve version: 0.6.0 TS Home: /home/jzp/miniconda3/envs/serve_new/lib/python3.10/site-packages Current directory: /home/jzp/codes/model_online/models Temp directory: /tmp Number of GPUs: 1 Number of CPUs: 36 Max heap size: 16000 M Python executable: /home/jzp/miniconda3/envs/serve_new/bin/python3.10 Config file: N/A Inference address: http://127.0.0.1:8080 Management address: http://127.0.0.1:8081 Metrics address: http://127.0.0.1:8082 Model Store: /home/jzp/codes/model_online/models/model_store Initial Models: tmp=yoloxs.mar Log dir: /home/jzp/codes/model_online/models/logs Metrics dir: /home/jzp/codes/model_online/models/logs Netty threads: 0 Netty client threads: 0 Default workers per model: 1 Blacklist Regex: N/A Maximum Response Size: 6553500 Maximum Request Size: 6553500 Limit Maximum Image Pixels: true Prefer direct buffer: false Allowed Urls: [file://.|http(s)?://.] Custom python dependency for model allowed: false Metrics report format: prometheus Enable metrics API: true Workflow Store: /home/jzp/codes/model_online/models/model_store Model config: N/A 2022-09-16T19:38:09,535 [INFO ] main org.pytorch.serve.ModelServer - Torchserve version: 0.6.0 TS Home: /home/jzp/miniconda3/envs/serve_new/lib/python3.10/site-packages Current directory: /home/jzp/codes/model_online/models Temp directory: /tmp Number of GPUs: 1 Number of CPUs: 36 Max heap size: 16000 M Python executable: /home/jzp/miniconda3/envs/serve_new/bin/python3.10 Config file: N/A Inference address: http://127.0.0.1:8080 Management address: http://127.0.0.1:8081 Metrics address: http://127.0.0.1:8082 Model Store: /home/jzp/codes/model_online/models/model_store Initial Models: tmp=yoloxs.mar Log dir: /home/jzp/codes/model_online/models/logs Metrics dir: /home/jzp/codes/model_online/models/logs Netty threads: 0 Netty client threads: 0 Default workers per model: 1 Blacklist Regex: N/A Maximum Response Size: 6553500 Maximum Request Size: 6553500 Limit Maximum Image Pixels: true Prefer direct buffer: false Allowed Urls: [file://.|http(s)?://.] Custom python dependency for model allowed: false Metrics report format: prometheus Enable metrics API: true Workflow Store: /home/jzp/codes/model_online/models/model_store Model config: N/A 2022-09-16T19:38:09,547 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin... 2022-09-16T19:38:09,547 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin... 2022-09-16T19:38:09,567 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: yoloxs.mar 2022-09-16T19:38:09,567 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: yoloxs.mar 2022-09-16T19:38:09,989 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model tmp 2022-09-16T19:38:09,989 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model tmp 2022-09-16T19:38:09,989 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model tmp 2022-09-16T19:38:09,989 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model tmp 2022-09-16T19:38:09,989 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model tmp loaded. 2022-09-16T19:38:09,989 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model tmp loaded. 2022-09-16T19:38:09,989 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: tmp, count: 1 2022-09-16T19:38:09,989 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: tmp, count: 1 2022-09-16T19:38:09,997 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel. 2022-09-16T19:38:09,997 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel. 2022-09-16T19:38:09,997 [DEBUG] W-9000-tmp_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/jzp/miniconda3/envs/serve_new/bin/python3.10, /home/jzp/miniconda3/envs/serve_new/lib/python3.10/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9000] 2022-09-16T19:38:09,997 [DEBUG] W-9000-tmp_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/jzp/miniconda3/envs/serve_new/bin/python3.10, /home/jzp/miniconda3/envs/serve_new/lib/python3.10/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9000] 2022-09-16T19:38:10,053 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080 2022-09-16T19:38:10,053 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080 2022-09-16T19:38:10,053 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel. 2022-09-16T19:38:10,053 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel. 2022-09-16T19:38:10,054 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081 2022-09-16T19:38:10,054 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081 2022-09-16T19:38:10,054 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel. 2022-09-16T19:38:10,054 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel. 2022-09-16T19:38:10,055 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082 2022-09-16T19:38:10,055 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082 2022-09-16T19:38:10,217 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet. 2022-09-16T19:38:10,217 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet. 2022-09-16T19:38:10,576 [INFO ] W-9000-tmp_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9000 2022-09-16T19:38:10,578 [INFO ] W-9000-tmp_1.0-stdout MODEL_LOG - [PID]215423 2022-09-16T19:38:10,578 [INFO ] W-9000-tmp_1.0-stdout MODEL_LOG - Torch worker started. 2022-09-16T19:38:10,578 [DEBUG] W-9000-tmp_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-tmp_1.0 State change null -> WORKER_STARTED 2022-09-16T19:38:10,578 [INFO ] W-9000-tmp_1.0-stdout MODEL_LOG - Python runtime: 3.10.6 2022-09-16T19:38:10,578 [DEBUG] W-9000-tmp_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-tmp_1.0 State change null -> WORKER_STARTED 2022-09-16T19:38:10,583 [INFO ] W-9000-tmp_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000 2022-09-16T19:38:10,583 [INFO ] W-9000-tmp_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000 2022-09-16T19:38:10,594 [INFO ] W-9000-tmp_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9000. 2022-09-16T19:38:10,598 [INFO ] W-9000-tmp_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1663328290598 2022-09-16T19:38:10,598 [INFO ] W-9000-tmp_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1663328290598 2022-09-16T19:38:10,639 [INFO ] W-9000-tmp_1.0-stdout MODEL_LOG - model_name: tmp, batchSize: 1

Installation instructions

i have used it successfullt in april, but recently there is sth. wrong when i try to use it again

Model Packaing

i used torch.jit to get model.pt, and write my handler.py

torch-model-archiver \

--model-name yoloxs \

--version 1.0 \

--serialized-file /home/jzp/codes/model_online/models/model_pt/0612_gpu.pt \

--export-path /home/jzp/codes/model_online/models/model_mar \

--handler /home/jzp/codes/model_online/personal_file/yolox_handler.py -f

config.properties

No response

Versions

Torchserve branch:

torchserve==0.6.0 torch-model-archiver==0.6.0

Python version: 3.10 (64-bit runtime) Python executable: /home/jzp/miniconda3/envs/serve_new/bin/python

Versions of relevant python libraries: captum==0.5.0 future==0.18.2 numpy==1.23.3 nvgpu==0.9.0 psutil==5.9.2 requests==2.28.1 torch==1.12.1 torch-model-archiver==0.6.0 torch-workflow-archiver==0.2.4 torchserve==0.6.0 torchvision==0.13.1 wheel==0.37.1 torch==1.12.1 **Warning: torchtext not present .. torchvision==0.13.1 **Warning: torchaudio not present ..

Java Version:

OS: Ubuntu 20.04.4 LTS GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: N/A CMake version: N/A

Is CUDA available: Yes CUDA runtime version: 11.3.58 GPU models and configuration: GPU 0: NVIDIA RTX A6000 GPU 1: NVIDIA RTX A6000 Nvidia driver version: 470.141.03 cuDNN version: Probably one of the following: /usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn.so.8 /usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8 /usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_adv_train.so.8 /usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8 /usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8 /usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8 /usr/local/cuda-11.3/targets/x86_64-linux/lib/libcudnn_ops_train.so.8

java -version openjdk 17.0.4 2022-07-19 OpenJDK Runtime Environment (build 17.0.4+8-Ubuntu-120.04) OpenJDK 64-Bit Server VM (build 17.0.4+8-Ubuntu-120.04, mixed mode, sharing)

Repro instructions

i don't understand this question

Possible Solution

No response

jdddp avatar Sep 16 '22 11:09 jdddp

@jdddp I don't see anything wrong in the logs you have shared. Can you please share the model file and the handler so I can try to repro.

agunapal avatar Sep 22 '22 17:09 agunapal