serve icon indicating copy to clipboard operation
serve copied to clipboard

Data-oriented & SOLID `benchmarks/benchmark-ab.py`

Open msaroufim opened this issue 2 years ago • 0 comments

🚀 The feature

Wrote this after an offline discussion with @lxning

We've recently standardized on using benchmarks/benchmark-ab.py as the preferred way to benchmark torchserve models and used it to build benchmarks/auto_benchmark.py which is now integrated in our CI. As our benchmark suite becomes more useful I recently tried to extend it with Docker support and noticed we can do a better job of refactoring it.

So below I'll describe the problems with our current benchmark suite and how to improve them with a more data oriented design by using @dataclass and more SOLID by using ABC which will make it possible and easy to test all of our benchmark scripts

Motivation, pitch

At a high level our existing benchmark scripts work by

  1. Taking some configuration data for a benchmark run which includes model specific data, benchmark configurations and some temporary directories.
  2. Picks an environment to initialize docker vs local
  3. Runs some inferences on a model
  4. Inspect the generated metrics files on disk, parse the metrics files into an artifact
  5. Take that artifact and turn it into a more human readable report.

We would like to keep the original intent but just clarify it by removing stateful functions, duplication and global variables

Problems

Our existing solution has two major problems

Code duplication

  • Execute: https://github.com/pytorch/serve/blob/master/benchmarks/auto_benchmark.py#L225
  • Two functions in this file for ts_version: https://github.com/pytorch/serve/blob/master/benchmarks/auto_benchmark.py#L232
  • Duplicating start and stop functions from our test suite https://github.com/pytorch/serve/blob/7e226ea19fdca244f2fb9712d1a62b389442b6a9/benchmarks/benchmark-ab.py#L648
  • Duplicating configs: https://github.com/pytorch/serve/blob/7e226ea19fdca244f2fb9712d1a62b389442b6a9/benchmarks/benchmark-ab.py#L494 and https://github.com/pytorch/serve/blob/master/benchmarks/utils/gen_metrics_json.py#L39
  • Duplicating config with boolean flag for workflow vs model https://github.com/pytorch/serve/blob/master/benchmarks/benchmark-ab.py#L243 (in general it's better for us to create two seperate functions that takes some generic config object that's centrally defined)
  • Duplicated params and config https://github.com/pytorch/serve/blob/master/benchmarks/benchmark-ab.py#L124

This is the primary place where a more data oriented design will help

Global variables and stateful functions

  • Example of global variable is our config which each function then overwrites with various parameters: https://github.com/pytorch/serve/blob/master/benchmarks/benchmark-ab.py#L19
  • Any function in that file which does not return a value is a function with state and is confusing to reason about

This problem will be further exacerbated as we add more kinds of charting and reports, add more configurations, more environments especially as we're now building off the original benchmark-ab.py with a new auto_benchmark.py script. Stateful functions also make it difficult to test our existing benchmark-ab.py which in turn makes it unsafe to add even minor changes

Proposed solution

A good chunk of our code here is dedicated towards passing parameters, reading them, converting them from one format to another and those ideally handled by a python @dataclass

Model configurations would include everything that's model specific

@dataclass
class ModelConfig:
    url : URL =  "https://torchserve.pytorch.org/mar_files/resnet-18.mar",
    gpus : str = "", # Make this better
    exec_env : ExecEnv = ExecEnv.local,
    batch_size : int =  1,
    batch_delay: int = 200,
    workers : int =  1,
    concurrency : int = 10,
    requests: int = 100,

Whereas the benchmark config is then all the parameters that our benchmark suite expects. In our existing design we have coupled both which makes it more difficult to support new parameters as they need to be added in multiple places.

@dataclass
class BenchmarkConfig:
    input: str =  "../examples/image_classifier/kitten.jpg",
    content_type : str =  "application/jpg",
    image: str =  "",
    docker_runtime :str = "",
    backend_profiling : bool = False,
    config_properties: str = "config.properties",
    inference_model_url: URL = "predictions/benchmark",
    management_model_url : URL = "PLACEHOLDER/placeholder"
    report_location: str = tempfile.gettempdir(),
    tmp_dir: str = tempfile.gettempdir(),
    synchronous : bool = True
    result_file : str = os.path.join(tmp_dir, "benchmark/result.txt")
    metric_log :  str = os.path.join(tmp_dir, "benchmark/logs/model_metrics.log")
    management_api : URL = "http://127.0.0.1:8081"
    inference_api : URL = "http://127.0.0.1:8080"

So we can have benchmark(model_config : ModelConfig, benchmark_config : BenchmarkConfig) which makes it a lot easier to add new fields and drastically reduces the amount of code we need to have.

We also have a lot of code dedicated towards turning command line arguments to a config we expect and with our dataclass design this becomes a few lines of code

# This is the whole argument parser
args = ArgumentParser(description="Run the Apache Bench Benchmark suite")
for field in [map(fields, [ModelConfig, BenchmarkConfig])]:
    args.add_argument(f"--{field}")

Our soak tests become a single line of code example. This is convenient we can then leverage property based testing to generate parameters to benchmark things more thoroughly

vgg11_1000r_10c = ModelConfig(url="https://torchserve.pytorch.org/mar_files/vgg11.mar", requests=1000, concurrency=10)

For our benchmark artifacts there are a few standard metrics we support so we can standardize them

@dataclass
class BenchmarkArtifact:
    benchmark_config : BenchmarkConfig
    benchmark : str
    ts_failed_requests : int
    ts_latency_p50 : float 
    ts_latency_p90 : float
    ts_latency_p99 : float
    ts_latency : float

The benefit here is that we can create code that takes in a BenchmarkArtifact and creates a report out of it

class BenchmarkReport(ABC):
    @staticmethod
    @abstractmethod
    def generate_report(artifact : BenchmarkArtifact) -> Path:
        pass


artifact = benchmark()

# All of the below are static classes that subclass `BenchmarkReport`
CSVReport.generate_report(artifact)
MDReport.generate_report(artifact)
LatencyGraph.generate_report(artifact)
ProfileGraph.generate_report(artifact)

Some state is unavoidable, Torchserve writes its metric logs in files but we can decouple the code that reads the metrics from a file from the code that generates a report out of it by creating a separate MetricFileToArtifactParser

class MetricFileToArtifactParser:
    def __init__(self):
            self.metrics = Metrics()

    def parse_metrics(Metrics):
      for file in field(Metrics):
        # parse_code_here
   

Where metrics are again defined in their own dataclass which makes it trivial to keep a central store of all available metrics files and extend with new metrics without having to change the rest of our code

@dataclass
class Metrics:
    PredictionTime : Path = "predict.txt"
    HandlerTime : Path = "handler_time.txt"
    QueueTime : Path = "waiting_time.txt"
    WorkerThreadTime : Path = "worker_thread.txt"
    CPUUtilization : Path = "cpu_percentage.txt"
    MemoryUtilization : Path = "memory_percentage.txt"
    GPUUtulization : Path = "memory_percentage.txt"
    GPUMemoryUtilization : Path = "gpu_percentage.txt"
    GPUMemoryUsed : Path = "gpu_memory_used.txt"

We have code that takes in a dictionary and creates a request to torchserve, we can standardize on a schema that we can leverage in our tests, examples and benchmarks instead of creating custom parsing code each time we use the requests library

@dataclass
class TorchServeRequest:
    model_name: str
    url :str
    batch_delay: int
    batch_size : int 
    initial_workers: int
    synchronous: bool = True

Then end to end the way this all works is

  1. Given a config
  2. Setup the environment for benchmarks
  3. Run the benchmarks
  4. Cleanup

And we can separate out if conditions that do docker vs local environment work by instead

class BenchmarkEnvironmentSetup(ABC):
    
    @abstractmethod
    def prepare_extra_dependency() -> None:
        pass

    @abstractmethod
    def start() -> None:
        pass
    
    @abstractmethod
    def run(self) -> None:
        # self.prepare_extra_dependency()
        # self.start()
        # self.stop()

    @abstractmethod
    def stop() -> None:
        pass

And then each docker vs local would inherit and implement each of the relevant functions they need to run benchmarks

class LocalEnvironmentSetup(BenchmarkEnvironmentSetup):

class DockerEnvironmentSetup(BenchmarkEnvironmentSetup):

Alternatives

No response

Additional context

No response

msaroufim avatar Jul 06 '22 00:07 msaroufim