serve Make Getting Started actually get someone started

Description

I had to read a lot of the documentation to get torchserve installed via pip and running on Apple Silicon with a custom handler. During that process I tidied up the documentation and addressed several issues.

I left the work as small commits for easier review. I can squash them after a review and/or required changes.

This PR:

Tidies up formatting and grammar
Adds a missing PyYAML dependency
Prevents the MetricsCollector backend worker from crashing on non-nvidia hardware
Fixes the examples, which either did not work or were outdated (needed to understand how TS is intended to work)
Adds Model API to the other APIs in the sidebar for visibility (difficult to find otherwise)

Aug 17 '24 02:08 richardkmichael

A few comments:

Concerning the gRPC example:

In the gRPC examples, I think it would be better to use pip for those files, because:

programmers using gRPC may already have those packages installed
the install requirements are simpler: pip install grpc-tools grpc-status
it eliminates cloning the 60 MB submodule; also, submodules tend to be a confusing git topic

I'm unsure why torchserve has used the proto files as a submodule, instead of as a package dependency (pinned at a specific version, if changing proto defs is the concern?).

The googleapis submodule (at third_party/google/rpc) is older than the released package, so there are differences between the files in the torchserve repo and the package. I had this working, so I'm fairly confident it's fine, but here are the details.

torchserve appears to only require two Google files:

$ find . -path ./third_party\* -prune -o -name \*.proto -print0 | xargs -0 grep import

./frontend/server/src/main/resources/proto/inference.proto:import "google/protobuf/empty.proto";
./frontend/server/src/main/resources/proto/inference.proto:import "google/rpc/status.proto";

The more recent pip provided empty.proto and status.proto files are not in the differences from the submodule:

$ pip install grpc-status
$ PY_SITE=../v/lib/python3.11/site-packages
$ for pip_p in $( pip show --files googleapis-common-protos | grep -F .proto ) ; do
  REPO_P=$( find third_party -path \*/${pip_p} -type f )
  if [[ ! -f ${REPO_P} ]] ; then
    echo "New in pip: ${pip_p}"
  elif ! cmp -s -i 29 ${PY_SITE}/${pip_p} ${REPO_P} ; then
      echo "Files diff: ${pip_p}"
  fi
  done

Files diff: google/api/client.proto
Files diff: google/api/documentation.proto
Files diff: google/api/error_reason.proto
Files diff: google/api/field_behavior.proto
New in pip: google/api/field_info.proto
Files diff: google/api/monitored_resource.proto
Files diff: google/cloud/extended_operations.proto
New in pip: google/gapic/metadata/gapic_metadata.proto

So this will work:

pip install grpc-tools grpc-status
SITE_PROTO_FILES=$( python -c 'import site; print(site.getsitepackages()[0])' )
python -m grpc_tools.protoc --proto_path=$SITE_PROTO_FILES --proto_path=./serve/frontend/server/src/main/resources/proto/ --python_out=./serve/ts_scripts --grpc_python_out=./serve/ts_scripts ./serve/frontend/server/src/main/resources/proto/inference.proto ./serve/frontend/server/src/main/resources/proto/management.proto

It would also be helpful to include the torchserve inference proto files with the torchserve package. This would allow the examples to work with simply pip install torchserve, instead of cloning the repo.

Another improvement would be a tiny dedicated package, e.g., torchserve-protos, torchserve-grpc, grpc-torchserve (would match other "grpc" packages) which would allow developers to not need to depend on torchserve itself, or manually copy the protos.

Multi-input inference

The multi-input inference example is unclear or does not work. I have not changed it, but I believe it should be improved or clarified.

To get predictions from the loaded model which expects multiple inputs:
curl http://localhost:8080/predictions/squeezenet1_1 -F 'data=@docs/images/dogs-before.jpg' -F 'data=@docs/images/kitten_small.jpg'

This returns predictions for kitten_small.jpg only, even if batchSize: 2 is configured.

From the usage and reading the VisionHandler, it appears to be intended to predict both images separately and return a single array of two sets of predictions? This would be similar to calling the API twice: curl -T "{kitten_small.jpg,dogs-before.jpg}" http://...

This is opposed to the phrasing "expects multiple inputs". Which suggests one model call but two inputs, e.g., a model which combines two images.

Aug 17 '24 02:08 richardkmichael

Hi @richardkmichael thanks for the contribution, will go through the detailed changes tonight

Aug 19 '24 22:08 mreso

serve serve copied to clipboard

Make Getting Started actually get someone started

Description

Concerning the gRPC example:

Multi-input inference

serve
serve copied to clipboard