serve
serve copied to clipboard
Can't serve model.ts
Context
Hi! I have trained maskrcnn with detectron2 and have exported with this instructions (I used scripting) torch script model (model.ts). Then I used this example to start a torch serve. The only thing I've changed is the path to weights (I use my model.ts). Unfortunately I get issues after command:
torchserve --start --model-store model_store --models maskrcnn=maskrcnn.mar
Here is my log:
java.lang.NoSuchMethodError: java.nio.file.Files.readString(Ljava/nio/file/Path;)Ljava/lang/String; at org.pytorch.serve.util.ConfigManager.readFile(ConfigManager.java:235) at org.pytorch.serve.util.ConfigManager.<init>(ConfigManager.java:139) at org.pytorch.serve.util.ConfigManager.init(ConfigManager.java:285) at org.pytorch.serve.ModelServer.main(ModelServer.java:83)
Maybe I am doing and understanding something wrong? Or there is a problem?
- torchserve version: 0.5.3
- torch-model-archiver version: 0.5.3
- torch version: 1.10.1+cu111
- torchvision version [if any]: 0.11.2+cu111
- torchtext version [if any]:
- torchaudio version [if any]:
- java version: 1.8.0_312
- Operating System and version: Ubuntu 20.04.4
Your Environment
- Installed using source? [yes/no]: installed with pip
- Are you planning to deploy it using docker container? [yes/no]: no
- Is it a CPU or GPU environment?: GPU
- Using a default/custom handler? [If possible upload/share custom handler/model]: default
- What kind of model is it e.g. vision, text, audio?: vision
- Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? [If public url then provide link.]: local
Expected Behavior
Torch Serve should start
Current Behavior
Torch Serve falls with issue
After updating java to 11 version - I started getting another issue:
2022-05-06T07:30:37,301 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - Python runtime: 3.8.10
2022-05-06T07:30:37,301 [DEBUG] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-maskrcnn_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2022-05-06T07:30:37,301 [INFO ] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000
2022-05-06T07:30:37,302 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9000.
2022-05-06T07:30:37,302 [INFO ] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1651822237302
2022-05-06T07:30:37,309 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - model_name: maskrcnn, batchSize: 1
2022-05-06T07:30:37,366 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - Backend worker process died.
2022-05-06T07:30:37,366 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - File "/home/argo/.local/lib/python3.8/site-packages/ts/model_loader.py", line 83, in load
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - module, function_name = self._load_handler_file(handler)
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - File "/home/argo/.local/lib/python3.8/site-packages/ts/model_loader.py", line 123, in _load_handler_file
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - module = importlib.import_module(module_name)
2022-05-06T07:30:37,367 [INFO ] epollEventLoopGroup-5-6 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - return _bootstrap._gcd_import(name[level:], package, level)
2022-05-06T07:30:37,367 [DEBUG] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - File "<frozen importlib._bootstrap>", line 991, in _find_and_load
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
2022-05-06T07:30:37,367 [DEBUG] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException: null
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056) ~[?:?]
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133) ~[?:?]
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) ~[?:?]
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:189) [model-server.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'object_detector'
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG -
2022-05-06T07:30:37,367 [WARN ] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: maskrcnn, error: Worker died.
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - During handling of the above exception, another exception occurred:
2022-05-06T07:30:37,367 [DEBUG] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-maskrcnn_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG -
2022-05-06T07:30:37,368 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2022-05-06T07:30:37,368 [WARN ] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-maskrcnn_1.0-stderr
2022-05-06T07:30:37,368 [WARN ] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-maskrcnn_1.0-stdout
2022-05-06T07:30:37,368 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - File "/home/argo/.local/lib/python3.8/site-packages/ts/model_service_worker.py", line 189, in <module>
2022-05-06T07:30:37,368 [INFO ] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 8 seconds.
2022-05-06T07:30:37,368 [INFO ] W-9000-maskrcnn_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-maskrcnn_1.0-stdout
2022-05-06T07:30:37,382 [INFO ] W-9000-maskrcnn_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-maskrcnn_1.0-stderr
2022-05-06T07:30:45,369 [DEBUG] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3, /home/argo/.local/lib/python3.8/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9000]
2022-05-06T07:30:45,985 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9000
2022-05-06T07:30:45,985 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - [PID]848210
And I get the same error when I try to use pertained weights (not mine, but maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth
)
@ArgoHA Can you provide the exact command you used to create the mar file?
@maaquib Sure, here is it:
torch-model-archiver --model-name maskrcnn --version 1.0 --model-file serve/examples/object_detector/maskrcnn/model.py --serialized-file weights/model.ts --handler object_detector --extra-files serve/examples/object_detector/index_to_name.json
then:
mkdir model_store
mv maskrcnn.mar model_store/
And finally I try to start it up and get issues:
torchserve --start --model-store model_store --models maskrcnn=maskrcnn.mar
Any luck with this?
@joemal1234 I did not manage to make it work, used different serve method