serve huge performance gap on intel mac metal vs docker

🐛 Describe the bug

The results obtained by running in different environments vary greatly

run in macbook pro 2019 metal
run in the same metal, but use docker 20.10.17

running command is exactly the same as the mar file, but serve TPS differs greatly

run in metal

torchserve --start --model-store model-store --models all

wrk result

wrk -c 100 -t 6 -s content-1.lua --latency http://127.0.0.1:8080/predictions/bert -d 10
Running 10s test @ http://127.0.0.1:8080/predictions/bert
  6 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   204.10ms  166.13ms   1.21s    93.10%
    Req/Sec    98.43     30.03   151.00     72.46%
  Latency Distribution
     50%  154.06ms
     75%  168.14ms
     90%  298.35ms
     99%    1.07s
  5415 requests in 10.10s, 3.56MB read
Requests/sec:    536.39
Transfer/sec:    361.44KB

run in docker

docker run --rm -it -p 8080:8080 -p 8081:8081 \
    -p 8082:8082 -p 7070:7070 -p 7071:7071 \
    --name bert \
    --entrypoint=bash \
    -v $(pwd)/model-store:/home/model-server/model-store \
    pytorch/torchserve:0.6.0-cpu

and in docker exec the same run cmd

torchserve --start --model-store model-store --models all

wrk result

wrk -c 100 -t 6 -s content-1.lua --latency http://127.0.0.1:8080/predictions/bert -d 10
Running 10s test @ http://127.0.0.1:8080/predictions/bert
  6 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.42s   496.67ms   1.97s    75.00%
    Req/Sec     2.62      4.42    20.00     86.00%
  Latency Distribution
     50%    1.84s
     75%    1.89s
     90%    1.96s
     99%    1.97s
  62 requests in 10.06s, 41.73KB read
  Socket errors: connect 0, read 0, write 0, timeout 50
Requests/sec:      6.16
Transfer/sec:      4.15KB

Error logs

No error

but TPS run in metal 500/s, run in docker 7/s

Installation instructions

metal

install from source v0.6.0

docker

pytorch/torchserve:0.6.0-cpu

Model Packaing

torch-model-archiver --force --model-name bert \
    --version 1.0.0 \
    --serialized-file models/bert_model.pt \
    --extra-files ./bert_record.py,./models/content_id_map.json \
    --handler bert_handler.py

config.properties

#Saving snapshot #Wed Aug 10 10:05:54 CST 2022 inference_address=http://0.0.0.0:8080 load_models=all model_store=model-store async_logging=true number_of_gpu=0 job_queue_size=1000 python=/Users/eric/code/re/re-serving/venv/bin/python model_snapshot={\n "name": "20220810100554930-startup.cfg",\n "modelCount": 1,\n "created": 1660097154931,\n "models": {\n "bert": {\n "1.0.0": {\n "defaultVersion": true,\n "marName": "bert.mar",\n "minWorkers": 12,\n "maxWorkers": 12,\n "batchSize": 1,\n "maxBatchDelay": 100,\n "responseTimeout": 120\n }\n }\n }\n} tsConfigFile=logs/config/20220810100550643-shutdown.cfg version=0.6.0 workflow_store=model-store number_of_netty_threads=32 management_address=http://0.0.0.0:8081 metrics_address=http://0.0.0.0:8082

Versions

torch==1.12.0 torch-model-archiver==0.6.0 torch-workflow-archiver==0.2.4 torchserve==0.6.0

Repro instructions

the same mar model，run in diff env

wrk preformance

wrk -c 100 -t 6 -s content-1.lua --latency http://127.0.0.1:8080/predictions/bert -d 10

Possible Solution

No response

Aug 10 '22 02:08 yayuntian

@lxning help

Aug 10 '22 05:08 yayuntian

@yayuntian Can you provide the reproduction steps, particularly regarding the model --serialized-file models/bert_model.pt --extra-files ./bert_record.py,./models/content_id_map.json. I want to get a breakdown of where the latency is added, is it just the model or also the server frontend

Aug 11 '22 03:08 maaquib

@yayuntian This is not a pytorch issue. On MacOS Docker uses QEMU under the hood to simulate linux/aarch64 thus why the drop in performance. The only way around that is to not use Docker.

Dec 03 '23 01:12 gaby

serve serve copied to clipboard

huge performance gap on intel mac metal vs docker

🐛 Describe the bug

run in metal

run in docker

Error logs

Installation instructions

metal

docker

Model Packaing

config.properties

Versions

Repro instructions

Possible Solution

serve
serve copied to clipboard