serve icon indicating copy to clipboard operation
serve copied to clipboard

Can't serve model.ts

Open ArgoHA opened this issue 2 years ago • 5 comments

Context

Hi! I have trained maskrcnn with detectron2 and have exported with this instructions (I used scripting) torch script model (model.ts). Then I used this example to start a torch serve. The only thing I've changed is the path to weights (I use my model.ts). Unfortunately I get issues after command:

torchserve --start --model-store model_store --models maskrcnn=maskrcnn.mar

Here is my log:

java.lang.NoSuchMethodError: java.nio.file.Files.readString(Ljava/nio/file/Path;)Ljava/lang/String; at org.pytorch.serve.util.ConfigManager.readFile(ConfigManager.java:235) at org.pytorch.serve.util.ConfigManager.<init>(ConfigManager.java:139) at org.pytorch.serve.util.ConfigManager.init(ConfigManager.java:285) at org.pytorch.serve.ModelServer.main(ModelServer.java:83)

Maybe I am doing and understanding something wrong? Or there is a problem?

  • torchserve version: 0.5.3
  • torch-model-archiver version: 0.5.3
  • torch version: 1.10.1+cu111
  • torchvision version [if any]: 0.11.2+cu111
  • torchtext version [if any]:
  • torchaudio version [if any]:
  • java version: 1.8.0_312
  • Operating System and version: Ubuntu 20.04.4

Your Environment

  • Installed using source? [yes/no]: installed with pip
  • Are you planning to deploy it using docker container? [yes/no]: no
  • Is it a CPU or GPU environment?: GPU
  • Using a default/custom handler? [If possible upload/share custom handler/model]: default
  • What kind of model is it e.g. vision, text, audio?: vision
  • Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? [If public url then provide link.]: local

Expected Behavior

Torch Serve should start

Current Behavior

Torch Serve falls with issue

ArgoHA avatar May 04 '22 19:05 ArgoHA

After updating java to 11 version - I started getting another issue:

2022-05-06T07:30:37,301 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - Python runtime: 3.8.10
2022-05-06T07:30:37,301 [DEBUG] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-maskrcnn_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2022-05-06T07:30:37,301 [INFO ] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /tmp/.ts.sock.9000
2022-05-06T07:30:37,302 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - Connection accepted: /tmp/.ts.sock.9000.
2022-05-06T07:30:37,302 [INFO ] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1651822237302
2022-05-06T07:30:37,309 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - model_name: maskrcnn, batchSize: 1
2022-05-06T07:30:37,366 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - Backend worker process died.
2022-05-06T07:30:37,366 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG -   File "/home/argo/.local/lib/python3.8/site-packages/ts/model_loader.py", line 83, in load
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG -     module, function_name = self._load_handler_file(handler)
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG -   File "/home/argo/.local/lib/python3.8/site-packages/ts/model_loader.py", line 123, in _load_handler_file
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG -     module = importlib.import_module(module_name)
2022-05-06T07:30:37,367 [INFO ] epollEventLoopGroup-5-6 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG -   File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG -     return _bootstrap._gcd_import(name[level:], package, level)
2022-05-06T07:30:37,367 [DEBUG] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 991, in _find_and_load
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG -   File "<frozen importlib._bootstrap>", line 973, in _find_and_load_unlocked
2022-05-06T07:30:37,367 [DEBUG] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died.
java.lang.InterruptedException: null
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056) ~[?:?]
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2133) ~[?:?]
	at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:432) ~[?:?]
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:189) [model-server.jar:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - ModuleNotFoundError: No module named 'object_detector'
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - 
2022-05-06T07:30:37,367 [WARN ] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: maskrcnn, error: Worker died.
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - During handling of the above exception, another exception occurred:
2022-05-06T07:30:37,367 [DEBUG] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-maskrcnn_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2022-05-06T07:30:37,367 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - 
2022-05-06T07:30:37,368 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2022-05-06T07:30:37,368 [WARN ] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-maskrcnn_1.0-stderr
2022-05-06T07:30:37,368 [WARN ] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - terminateIOStreams() threadName=W-9000-maskrcnn_1.0-stdout
2022-05-06T07:30:37,368 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG -   File "/home/argo/.local/lib/python3.8/site-packages/ts/model_service_worker.py", line 189, in <module>
2022-05-06T07:30:37,368 [INFO ] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 8 seconds.
2022-05-06T07:30:37,368 [INFO ] W-9000-maskrcnn_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-maskrcnn_1.0-stdout
2022-05-06T07:30:37,382 [INFO ] W-9000-maskrcnn_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-maskrcnn_1.0-stderr
2022-05-06T07:30:45,369 [DEBUG] W-9000-maskrcnn_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/usr/bin/python3, /home/argo/.local/lib/python3.8/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /tmp/.ts.sock.9000]
2022-05-06T07:30:45,985 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - Listening on port: /tmp/.ts.sock.9000
2022-05-06T07:30:45,985 [INFO ] W-9000-maskrcnn_1.0-stdout MODEL_LOG - [PID]848210

And I get the same error when I try to use pertained weights (not mine, but maskrcnn_resnet50_fpn_coco-bf2d0c1e.pth)

ArgoHA avatar May 06 '22 07:05 ArgoHA

@ArgoHA Can you provide the exact command you used to create the mar file?

maaquib avatar May 06 '22 20:05 maaquib

@maaquib Sure, here is it:

torch-model-archiver --model-name maskrcnn --version 1.0 --model-file serve/examples/object_detector/maskrcnn/model.py --serialized-file weights/model.ts --handler object_detector --extra-files serve/examples/object_detector/index_to_name.json

then:

mkdir model_store
mv maskrcnn.mar model_store/

And finally I try to start it up and get issues:

torchserve --start --model-store model_store --models maskrcnn=maskrcnn.mar

ArgoHA avatar May 06 '22 20:05 ArgoHA

Any luck with this?

joemal1234 avatar Sep 01 '22 15:09 joemal1234

@joemal1234 I did not manage to make it work, used different serve method

ArgoHA avatar Sep 02 '22 16:09 ArgoHA