serve
serve copied to clipboard
Workflow API doesn't work with Kserve V2 envelope
Context
- torchserve version:
0.5.3
- torch-model-archiver version:
- torch version:
1.10.0
- torchvision version [if any]:
- torchtext version [if any]:
- torchaudio version [if any]:
- java version:
- Operating System and version: Ubuntu 18.04
Your Environment
I use the image pytorch/torchserve:0.5.3-cpu
as the environment.
- Installed using source? [yes/no]: No
- Are you planning to deploy it using docker container? [yes/no]: yes
- Is it a CPU or GPU environment?: CPU
- Using a default/custom handler? [If possible upload/share custom handler/model]: custom, see below.
- What kind of model is it e.g. vision, text, audio?: vision
I test with two workflows:
- dog_breed_classification in the examples
- default mnist model, and make a testing workflow with the following files:
mnist_workflow_handlers.py
import json
import base64
def print_data(data, context):
print("+" * 50)
print("node: print_data")
print("data:", data)
print("context:", context)
if data is None:
return data
output = []
for row in data:
d = row.get("data") or row.get("body")
# if isinstance(d, (bytes, bytearray)):
# d = d.decode('utf-8')
output.append(d)
return output
def test_data(data, context):
print("+" * 50)
print("node: test_data")
print("data:", data)
print("context:", context)
if data is None:
return data
output = []
for row in data:
d = row.get("data") or row.get("body")
# if isinstance(d, (bytes, bytearray)):
# d = d.decode('utf-8')
output.append(d)
return output
def combine_data(data, context):
if data is None:
return data
print("+" * 50)
print("node: combine_data")
print("data:", data)
print("context:", context)
output = []
for row in data:
data_dict = {}
for k in row:
d = row.get(k)
# if isinstance(d, (bytes, bytearray)):
# d = d.decode('utf-8')
data_dict[k] = d
output.append(data_dict)
print([d.get("print_data") for d in output])
return [d.get("print_data") for d in output]
mnist_workflow.yaml
models:
min-workers: 1
max-workers: 1
batch-size: 4
max-batch-delay: 100
retry-attempts: 5
timeout-ms: 300000
mnist:
url: mnist.mar
dag:
print_data: [test_data, combine_data]
test_data: [combine_data]
combine_data: [mnist]
- Are you planning to use local models from model-store or public url being used e.g. from S3 bucket etc.? [If public url then provide link.]:
- Provide config.properties, logs [ts.log] and parameters used for model registration/update APIs:
inference_address=http://0.0.0.0:8000
management_address=http://0.0.0.0:8001
metrics_address=http://0.0.0.0:8002
number_of_netty_threads=32
job_queue_size=1000
install_py_dep_per_model=true
service_envelope=kservev2
load_models=all
- Link to your project [if any]:
Expected Behavior
The workflow api should work out of box with kserve v2 envelope.
Current Behavior
The workflow api doesn't work with kserve v2 envelope.
Reasons
1.The envelope of the workflow function is not set correctly.
https://github.com/pytorch/serve/blob/6a191f822f2e759c2aad07b8f9eacbedcb039827/frontend/server/src/main/java/org/pytorch/serve/wlm/ModelManager.java#L122
While the models' envelope are set to the default envelope specified in config.properties, the workflow funcitons' envelope is null.
2. The request envelope kservev2.py
cannot internal workflow requests and nested requests properly.
https://github.com/pytorch/serve/blob/master/ts/torch_handler/request_envelope/kservev2.py
If a node has multiple parent nodes, for example node A
with result ["a"]
and node B
with result ["b"]
, the output of these node will be combined by the frontend
into:
[
{
"A": {"outputs": ["data": ["a"]]},
"B": {"outputs": ["data": ["b"]]},
}
]
However, because there is no key data
or body
(only A
and B
)in this json, the envelope class cannot process the data.
Even if the data is found, the kserve v2 envelope can only handle inputs key of inputs
other than outputs
.
Also in order to pass it to the model properly as it does with null envelope, the request data needs to be converted into the form:
[
"data": {
"A": ["a"],
"B": ["b"]
}
]
Possible Solution
It is possible to set all the workflow model/functions' envelope to null, whatever the service_envelope specifed in the config.properties. However, this creates an inconsistence between the model api standard and the workflow api standard. And all the models in the workflow use null envelope while other models use another envelope. Therefore, it is not a desired solution.
In order to solve the issue and maintain the consistency between APIs of workflow and standalone models:
- Set the envelope of workflow functions to the default envelope as regular models (those have
.mar
files) - Identify
internal workflow requests
between nodes, and let kserve v2 envelope be able to parseoutputs
data. For example, pass a special header from frontend, telling the backend this is aninternal workflow request
. - Identify internal workflow requests type, whether it is a
flat
ornested
data. For example, pass a special header from frontend, telling backend the request isflat
ornested
. - Kserve v2 envelope handler will handle the data according to the header.
I've prepared a pull request and will submit it asap for you review
I divide the changes into separate commits targeting different problems with description in the commit message. Once all of the commits fulfill the requirements, I will add more unittests, rebase and squash the commits.
If possible, please help to give suggestions whether I'm on the right track of this problem.
Steps to Reproduce
- Pull the image
pytorch/torchserve:0.5.3-cpu
- Don't set the envelope in
config.properties
. Run the container with the workflow mnist I provided above, or the example workflow dog_breed_classification. The prediction will succeed. - Set the
service_envelope=kservev2
, run the container with the workflow in2.
. Now the prediction will fail
Failure Logs [if any]
I add some print in ts/model_service_worker.py
:
class TorchModelServiceWorker(object):
@staticmethod
def load_model(load_model_request):
try:
model_dir = load_model_request["modelPath"].decode("utf-8")
model_name = load_model_request["modelName"].decode("utf-8")
handler = load_model_request["handler"].decode("utf-8") if load_model_request["handler"] else None
envelope = load_model_request["envelope"].decode("utf-8") if "envelope" in load_model_request else None
envelope = envelope if envelope is not None and len(envelope) > 0 else None
print("=" * 50)
print("model name:", model_name)
print("envelope: ", envelope)
Below are the logs when loading the dog_breed_classification
:
+ 2022-03-22T06:39:04,375 [INFO ] W-9024-dog_breed_wf_new__pre_processing_1.0-stdout MODEL_LOG - ==================================================
+ 2022-03-22T06:39:04,375 [INFO ] W-9024-dog_breed_wf_new__pre_processing_1.0-stdout MODEL_LOG - model_name: dog_breed_wf_new__pre_processing
+ 2022-03-22T06:39:04,375 [INFO ] W-9024-dog_breed_wf_new__pre_processing_1.0-stdout MODEL_LOG - envelope: None
2022-03-22T06:39:04,375 [INFO ] W-9024-dog_breed_wf_new__pre_processing_1.0-stdout MODEL_LOG - model_name: dog_breed_wf_new__pre_processing, batchSize: 1
2022-03-22T06:39:04,376 [INFO ] W-9024-dog_breed_wf_new__pre_processing_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1
2022-03-22T06:39:04,376 [DEBUG] W-9024-dog_breed_wf_new__pre_processing_1.0 org.pytorch.serve.wlm.WorkerThread - W-9024-dog_breed_wf_new__pre_processing_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2022-03-22T06:39:04,376 [INFO ] W-9024-dog_breed_wf_new__pre_processing_1.0 TS_METRICS - W-9024-dog_breed_wf_new__pre_processing_1.0.ms:959|#Level:Host|#hostname:8b026932f222,timestamp:1647931144
2022-03-22T06:39:04,376 [INFO ] W-9024-dog_breed_wf_new__pre_processing_1.0 TS_METRICS - WorkerThreadTime.ms:1|#Level:Host|#hostname:8b026932f222,timestamp:null
2022-03-22T06:39:04,875 [DEBUG] pool-6-thread-2 org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model dog_breed_wf_new__dog_breed_classification
2022-03-22T06:39:04,875 [DEBUG] pool-6-thread-2 org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model dog_breed_wf_new__dog_breed_classification
2022-03-22T06:39:04,875 [INFO ] pool-6-thread-2 org.pytorch.serve.wlm.ModelManager - Model dog_breed_wf_new__dog_breed_classification loaded.
2022-03-22T06:39:04,875 [DEBUG] pool-6-thread-2 org.pytorch.serve.wlm.ModelManager - updateModel: dog_breed_wf_new__dog_breed_classification, count: 1
2022-03-22T06:39:04,876 [DEBUG] W-9026-dog_breed_wf_new__dog_breed_classification_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.8/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9026]
2022-03-22T06:39:05,040 [INFO ] W-9025-dog_breed_wf_new__cat_dog_classification_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9025
2022-03-22T06:39:05,041 [INFO ] W-9025-dog_breed_wf_new__cat_dog_classification_1.0-stdout MODEL_LOG - [PID]14321
2022-03-22T06:39:05,041 [INFO ] W-9025-dog_breed_wf_new__cat_dog_classification_1.0-stdout MODEL_LOG - Torch worker started.
2022-03-22T06:39:05,041 [INFO ] W-9025-dog_breed_wf_new__cat_dog_classification_1.0-stdout MODEL_LOG - Python runtime: 3.8.0
2022-03-22T06:39:05,041 [DEBUG] W-9025-dog_breed_wf_new__cat_dog_classification_1.0 org.pytorch.serve.wlm.WorkerThread - W-9025-dog_breed_wf_new__cat_dog_classification_1.0 State change null -> WORKER_STARTED
2022-03-22T06:39:05,041 [INFO ] W-9025-dog_breed_wf_new__cat_dog_classification_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9025
2022-03-22T06:39:05,042 [INFO ] W-9025-dog_breed_wf_new__cat_dog_classification_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9025.
2022-03-22T06:39:05,042 [INFO ] W-9025-dog_breed_wf_new__cat_dog_classification_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1647931145042
+ 2022-03-22T06:39:05,042 [INFO ] W-9025-dog_breed_wf_new__cat_dog_classification_1.0-stdout MODEL_LOG - ==================================================
+ 2022-03-22T06:39:05,043 [INFO ] W-9025-dog_breed_wf_new__cat_dog_classification_1.0-stdout MODEL_LOG - model_name: dog_breed_wf_new__cat_dog_classification
+ 2022-03-22T06:39:05,043 [INFO ] W-9025-dog_breed_wf_new__cat_dog_classification_1.0-stdout MODEL_LOG - envelope: kservev2
2022-03-22T06:39:05,043 [INFO ] W-9025-dog_breed_wf_new__cat_dog_classification_1.0-stdout MODEL_LOG - model_name: dog_breed_wf_new__cat_dog_classification, batchSize: 4
2022-03-22T06:39:05,781 [INFO ] W-9026-dog_breed_wf_new__dog_breed_classification_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9026
2022-03-22T06:39:05,782 [INFO ] W-9026-dog_breed_wf_new__dog_breed_classification_1.0-stdout MODEL_LOG - [PID]14334
2022-03-22T06:39:05,782 [INFO ] W-9026-dog_breed_wf_new__dog_breed_classification_1.0-stdout MODEL_LOG - Torch worker started.
2022-03-22T06:39:05,782 [INFO ] W-9026-dog_breed_wf_new__dog_breed_classification_1.0-stdout MODEL_LOG - Python runtime: 3.8.0
2022-03-22T06:39:05,782 [DEBUG] W-9026-dog_breed_wf_new__dog_breed_classification_1.0 org.pytorch.serve.wlm.WorkerThread - W-9026-dog_breed_wf_new__dog_breed_classification_1.0 State change null -> WORKER_STARTED
2022-03-22T06:39:05,782 [INFO ] W-9026-dog_breed_wf_new__dog_breed_classification_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9026
2022-03-22T06:39:05,783 [INFO ] W-9026-dog_breed_wf_new__dog_breed_classification_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9026.
2022-03-22T06:39:05,783 [INFO ] W-9026-dog_breed_wf_new__dog_breed_classification_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req. to backend at: 1647931145783
+ 2022-03-22T06:39:05,783 [INFO ] W-9026-dog_breed_wf_new__dog_breed_classification_1.0-stdout MODEL_LOG - ==================================================
+ 2022-03-22T06:39:05,783 [INFO ] W-9026-dog_breed_wf_new__dog_breed_classification_1.0-stdout MODEL_LOG - model_name: dog_breed_wf_new__dog_breed_classification
+ 2022-03-22T06:39:05,783 [INFO ] W-9026-dog_breed_wf_new__dog_breed_classification_1.0-stdout MODEL_LOG - envelope: kservev2
2022-03-22T06:39:05,783 [INFO ] W-9026-dog_breed_wf_new__dog_breed_classification_1.0-stdout MODEL_LOG - model_name: dog_breed_wf_new__dog_breed_classification, batchSize: 4
2022-03-22T06:39:06,366 [INFO ] W-9025-dog_breed_wf_new__cat_dog_classification_1.0-stdout MODEL_LOG - Missing the index_to_name.json file. Inference output will not include class name.
2022-03-22T06:39:06,366 [INFO ] W-9025-dog_breed_wf_new__cat_dog_classification_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1324
2022-03-22T06:39:06,366 [DEBUG] W-9025-dog_breed_wf_new__cat_dog_classification_1.0 org.pytorch.serve.wlm.WorkerThread - W-9025-dog_breed_wf_new__cat_dog_classification_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2022-03-22T06:39:06,366 [INFO ] W-9025-dog_breed_wf_new__cat_dog_classification_1.0 TS_METRICS - W-9025-dog_breed_wf_new__cat_dog_classification_1.0.ms:2231|#Level:Host|#hostname:8b026932f222,tim