serve
serve copied to clipboard
Data-oriented & SOLID `benchmarks/benchmark-ab.py`
🚀 The feature
Wrote this after an offline discussion with @lxning
We've recently standardized on using benchmarks/benchmark-ab.py
as the preferred way to benchmark torchserve models and used it to build benchmarks/auto_benchmark.py
which is now integrated in our CI. As our benchmark suite becomes more useful I recently tried to extend it with Docker support and noticed we can do a better job of refactoring it.
So below I'll describe the problems with our current benchmark suite and how to improve them with a more data oriented design by using @dataclass
and more SOLID by using ABC
which will make it possible and easy to test all of our benchmark scripts
Motivation, pitch
At a high level our existing benchmark scripts work by
- Taking some configuration data for a benchmark run which includes model specific data, benchmark configurations and some temporary directories.
- Picks an environment to initialize docker vs local
- Runs some inferences on a model
- Inspect the generated metrics files on disk, parse the metrics files into an artifact
- Take that artifact and turn it into a more human readable report.
We would like to keep the original intent but just clarify it by removing stateful functions, duplication and global variables
Problems
Our existing solution has two major problems
Code duplication
- Execute: https://github.com/pytorch/serve/blob/master/benchmarks/auto_benchmark.py#L225
- Two functions in this file for ts_version: https://github.com/pytorch/serve/blob/master/benchmarks/auto_benchmark.py#L232
- Duplicating start and stop functions from our test suite https://github.com/pytorch/serve/blob/7e226ea19fdca244f2fb9712d1a62b389442b6a9/benchmarks/benchmark-ab.py#L648
- Duplicating configs: https://github.com/pytorch/serve/blob/7e226ea19fdca244f2fb9712d1a62b389442b6a9/benchmarks/benchmark-ab.py#L494 and https://github.com/pytorch/serve/blob/master/benchmarks/utils/gen_metrics_json.py#L39
- Duplicating config with boolean flag for workflow vs model https://github.com/pytorch/serve/blob/master/benchmarks/benchmark-ab.py#L243 (in general it's better for us to create two seperate functions that takes some generic config object that's centrally defined)
- Duplicated params and config https://github.com/pytorch/serve/blob/master/benchmarks/benchmark-ab.py#L124
This is the primary place where a more data oriented design will help
Global variables and stateful functions
- Example of global variable is our config which each function then overwrites with various parameters: https://github.com/pytorch/serve/blob/master/benchmarks/benchmark-ab.py#L19
- Any function in that file which does not return a value is a function with state and is confusing to reason about
This problem will be further exacerbated as we add more kinds of charting and reports, add more configurations, more environments especially as we're now building off the original benchmark-ab.py
with a new auto_benchmark.py
script. Stateful functions also make it difficult to test our existing benchmark-ab.py
which in turn makes it unsafe to add even minor changes
Proposed solution
A good chunk of our code here is dedicated towards passing parameters, reading them, converting them from one format to another and those ideally handled by a python @dataclass
Model configurations would include everything that's model specific
@dataclass
class ModelConfig:
url : URL = "https://torchserve.pytorch.org/mar_files/resnet-18.mar",
gpus : str = "", # Make this better
exec_env : ExecEnv = ExecEnv.local,
batch_size : int = 1,
batch_delay: int = 200,
workers : int = 1,
concurrency : int = 10,
requests: int = 100,
Whereas the benchmark config is then all the parameters that our benchmark suite expects. In our existing design we have coupled both which makes it more difficult to support new parameters as they need to be added in multiple places.
@dataclass
class BenchmarkConfig:
input: str = "../examples/image_classifier/kitten.jpg",
content_type : str = "application/jpg",
image: str = "",
docker_runtime :str = "",
backend_profiling : bool = False,
config_properties: str = "config.properties",
inference_model_url: URL = "predictions/benchmark",
management_model_url : URL = "PLACEHOLDER/placeholder"
report_location: str = tempfile.gettempdir(),
tmp_dir: str = tempfile.gettempdir(),
synchronous : bool = True
result_file : str = os.path.join(tmp_dir, "benchmark/result.txt")
metric_log : str = os.path.join(tmp_dir, "benchmark/logs/model_metrics.log")
management_api : URL = "http://127.0.0.1:8081"
inference_api : URL = "http://127.0.0.1:8080"
So we can have benchmark(model_config : ModelConfig, benchmark_config : BenchmarkConfig)
which makes it a lot easier to add new fields and drastically reduces the amount of code we need to have.
We also have a lot of code dedicated towards turning command line arguments to a config we expect and with our dataclass design this becomes a few lines of code
# This is the whole argument parser
args = ArgumentParser(description="Run the Apache Bench Benchmark suite")
for field in [map(fields, [ModelConfig, BenchmarkConfig])]:
args.add_argument(f"--{field}")
Our soak tests become a single line of code example. This is convenient we can then leverage property based testing to generate parameters to benchmark things more thoroughly
vgg11_1000r_10c = ModelConfig(url="https://torchserve.pytorch.org/mar_files/vgg11.mar", requests=1000, concurrency=10)
For our benchmark artifacts there are a few standard metrics we support so we can standardize them
@dataclass
class BenchmarkArtifact:
benchmark_config : BenchmarkConfig
benchmark : str
ts_failed_requests : int
ts_latency_p50 : float
ts_latency_p90 : float
ts_latency_p99 : float
ts_latency : float
The benefit here is that we can create code that takes in a BenchmarkArtifact
and creates a report out of it
class BenchmarkReport(ABC):
@staticmethod
@abstractmethod
def generate_report(artifact : BenchmarkArtifact) -> Path:
pass
artifact = benchmark()
# All of the below are static classes that subclass `BenchmarkReport`
CSVReport.generate_report(artifact)
MDReport.generate_report(artifact)
LatencyGraph.generate_report(artifact)
ProfileGraph.generate_report(artifact)
Some state is unavoidable, Torchserve writes its metric logs in files but we can decouple the code that reads the metrics from a file from the code that generates a report out of it by creating a separate MetricFileToArtifactParser
class MetricFileToArtifactParser:
def __init__(self):
self.metrics = Metrics()
def parse_metrics(Metrics):
for file in field(Metrics):
# parse_code_here
Where metrics are again defined in their own dataclass which makes it trivial to keep a central store of all available metrics files and extend with new metrics without having to change the rest of our code
@dataclass
class Metrics:
PredictionTime : Path = "predict.txt"
HandlerTime : Path = "handler_time.txt"
QueueTime : Path = "waiting_time.txt"
WorkerThreadTime : Path = "worker_thread.txt"
CPUUtilization : Path = "cpu_percentage.txt"
MemoryUtilization : Path = "memory_percentage.txt"
GPUUtulization : Path = "memory_percentage.txt"
GPUMemoryUtilization : Path = "gpu_percentage.txt"
GPUMemoryUsed : Path = "gpu_memory_used.txt"
We have code that takes in a dictionary and creates a request to torchserve, we can standardize on a schema that we can leverage in our tests, examples and benchmarks instead of creating custom parsing code each time we use the requests
library
@dataclass
class TorchServeRequest:
model_name: str
url :str
batch_delay: int
batch_size : int
initial_workers: int
synchronous: bool = True
Then end to end the way this all works is
- Given a config
- Setup the environment for benchmarks
- Run the benchmarks
- Cleanup
And we can separate out if
conditions that do docker vs local environment work by instead
class BenchmarkEnvironmentSetup(ABC):
@abstractmethod
def prepare_extra_dependency() -> None:
pass
@abstractmethod
def start() -> None:
pass
@abstractmethod
def run(self) -> None:
# self.prepare_extra_dependency()
# self.start()
# self.stop()
@abstractmethod
def stop() -> None:
pass
And then each docker
vs local
would inherit and implement each of the relevant functions they need to run benchmarks
class LocalEnvironmentSetup(BenchmarkEnvironmentSetup):
class DockerEnvironmentSetup(BenchmarkEnvironmentSetup):
Alternatives
No response
Additional context
No response