serve Modularize ipex optimization in `base_handler.py` into `ts/utils/ipex

This PR

updates IPEX integration into TorchServe following #1631 .

As described in #1631, Model optimization now

# in base_handler.py
if ipex_enabled:
    self.model = self.model.to(memory_format=torch.channels_last)
    self.model = ipex.optimize(self.model)

Model optimization after

# in ts/utils/optimization.py
class Optimization():
  def __init__(model):
    self.model = model
  def optimize(self) -> torch.nn.Module:
    raise NotImplementedError("This is an abstract base class, you need to call or create your own runtime")

# in ts/utils/ipex_optimization.py
class IPEXOptimization(Optimization):
  def __init__(self, model, **kwargs)
    super().__init__(model)
  def optimize(self):
    self.model = ipex.optimize(self.model)
    return self.model 

# in base_handler.py
if ipex_enabled or onednn_graph_fusion_enabled:
    self.optimization = OPTIMIZATIONS["ipex"](self.model, ipex_enabled, onednn_graph_fusion_enabled)
    self.model = self.optimization.optimize()

Integrates IPEX optimization features bfloat16 auto mixed precision, oneDNN graph fusion, int8, etc into ts/utils/ipex_optimization.py

2.1 Refactors IPEX integration with more flexible interface for IPEX features. Let's demonstrate with an exemplary IPEX feature, int8. Currently to create IPEX int8 mar files, users have to follow step-by-step guidelines like the one given here. Now instead, users can simply provide .mar file without any ipex optimization as they usually would, and specify in config.properties as follows:

ipex_enable=true
ipex_dtype=int8
ipex_torchscript=true
ipex_input_tensor_shapes=1, 3, 224, 224
ipex_input_tensor_dtype=TYPE_FP32

ts/utils/ipex_optimization.py will then take care of the IPEX int8 optimization.

With the base_handler modularized, future IPEX feature integration will all be located in (1) ts/utils/ipex_optimization.py with minor (if any) changes to the base_handler.py, and (2) ConfigManager.java as needed.

Jun 01 '22 19:06 min-jean-cho

Hi @min-jean-cho I apologize but can you please pause your work here. After discussing with @lxning offline, we believe a better long term proposal would be to extend the message format we use between the frontend and the backend. The ConfigManager should really be for model specific configs that a user needs to care about. Environment variables are indeed convenient but they don't expose the full power of torchserve as a multi model framework.

For example this what our current message protocol looks like https://github.com/pytorch/serve/blob/master/ts/protocol/otf_message_handler.py#L183-L193 - I'll take a look next week at extending it

Jun 03 '22 17:06 msaroufim

Hi @min-jean-cho I apologize but can you please pause your work here. After discussing with @lxning offline, we believe a better long term proposal would be to extend the message format we use between the frontend and the backend. The ConfigManager should really be for model specific configs that a user needs to care about. Environment variables are indeed convenient but they don't expose the full power of torchserve as a multi model framework.

For example this what our current message protocol looks like https://github.com/pytorch/serve/blob/master/ts/protocol/otf_message_handler.py#L183-L193 - I'll take a look next week at extending it

Hi @msaroufim , no problem. Will stay tuned for an update on the long term proposal with extended message format. Thanks !

Jun 03 '22 17:06 min-jean-cho

Hi @minjeanc, let's revisit this PR

Instead of using environment variables you can create your own config file called ipex_config.json/yaml/txt/etc.. , pass it in using --extra-files when archiving a model https://github.com/pytorch/serve/tree/master/model-archiver#arguments - JSON is what we've mostly been using

And then from the handler check if that file exists (example: https://github.com/pytorch/serve/blob/master/ts/torch_handler/base_handler.py#L108), parse it and set all the configurations you need, this should also let you revert the changes to ConfigManager.java so even moving forward you should only need to write python code

Sep 12 '22 12:09 msaroufim

Hi @minjeanc, let's revisit this PR

Instead of using environment variables you can create your own config file called ipex_config.json/yaml/txt/etc.. , pass it in using --extra-files when archiving a model https://github.com/pytorch/serve/tree/master/model-archiver#arguments - JSON is what we've mostly been using

And then from the handler check if that file exists (example: https://github.com/pytorch/serve/blob/master/ts/torch_handler/base_handler.py#L108), parse it and set all the configurations you need, this should also let you revert the changes to ConfigManager.java so even moving forward you should only need to write python code

Thanks @msaroufim, I will have a look at this.

Sep 12 '22 15:09 min-jean-cho

Instead of using environment variables you can create your own config file called ipex_config.json/yaml/txt/etc.. , pass it in using --extra-files when archiving a model https://github.com/pytorch/serve/tree/master/model-archiver#arguments - JSON is what we've mostly been using

And then from the handler check if that file exists (example: https://github.com/pytorch/serve/blob/master/ts/torch_handler/base_handler.py#L108), parse it and set all the configurations you need, this should also let you revert the changes to ConfigManager.java so even moving forward you should only need to write python code

Hi @msaroufim, have made some updates. Let me know what you think of the updates.

The updates are:

add ipex optimization config and schema in ts/torch_handler/utils/conf/ipex_config.py User can pass their ipex optimization config via --extra_files ipex_config.yaml The choice of yaml vs. json was arbitrary -- happy to go with either option. The config file format can be documented in the ipex README. In the future, other custom optimization config (and schema) can be added in ts/torch_handler/utils/conf/custom_config.py
add ipex optimization in ts/torch_handler/utils/optimization/ipex_optimization.py In the future, other custom optimization can be added in ts/torch_handler/utils/optimization/custom_optimization.py
in the base_handler, ipex optimization is invoked following its config In the future, other custom optimization can similarly be invoked in the base_handler.

The base_handler now looks like: Model optimization now

# in base_handler.py
if ipex_enabled:
    self.model = self.model.to(memory_format=torch.channels_last)
    self.model = ipex.optimize(self.model)

Model optimization after

# in base_handler.py
if ipex_enabled:
    cfg_file_path = os.path.join(model_dir, "ipex_config.yaml")
    self.cfg = CONFIGURATIONS["ipex"](cfg_file_path)
    self.optimization = OPTIMIZATIONS["ipex"](self.cfg)

    self.model = self.optimization.optimize(self.model)

Sep 14 '22 01:09 min-jean-cho

Do we have UTs to cover various features added in this PR, e.g., inference with fp32/bf16 and int8 quantization?

@msaroufim , may I add IPEX-related UTs under this directory: https://github.com/pytorch/serve/tree/master/ts/torch_handler/unit_tests

Sep 20 '22 17:09 min-jean-cho

We can add unit tests to test/pytest. Also after chatting with @HamidShojanazeri offline we should split out optimizations into 2 different discussions

Optimizations done after training but before inference
Optimizations during inference via a runtime

So for example 1 would be any optimization that works after you run torch.save() and 2 would be something like onnxruntime.InferenceSession(model_onnx_path, extra_args)

The reason we need to make this distinction is optimizations in the style of 1 should not be happening in a handler, they should happen either as part of a seperate pre-inference step like https://github.com/msaroufim/torchprep or as an example with documentation. But optimisations in 2 have no choice but to be in a handler. Otherwise the most extreme case would be trying to get good out of box performance with some optimizations by having a full training loop happen in initialize() and that just feels wrong

Sep 22 '22 02:09 msaroufim

We can add unit tests to test/pytest. Also after chatting with @HamidShojanazeri offline we should split out optimizations into 2 different discussions

Optimizations done after training but before inference

Optimizations during inference via a runtime

So for example 1 would be any optimization that works after you run torch.save() and 2 would be something like onnxruntime.InferenceSession(model_onnx_path, extra_args)

The reason we need to make this distinction is optimizations in the style of 1 should not be happening in a handler, they should happen either as part of a seperate pre-inference step like https://github.com/msaroufim/torchprep or as an example with documentation. But optimisations in 2 have no choice but to be in a handler. Otherwise the most extreme case would be trying to get good out of box performance with some optimizations by having a full training loop happen in initialize() and that just feels wrong

Thanks @msaroufim -- makes sense. I see most IPEX optimizations fit into (1). I see bf16 amp context manager torch.cpu.amp.autocast is necessary for (2) like here https://github.com/pytorch/serve/pull/1664/files#diff-d30e1f5ef9fd05e2ab9f652c116461632586cc132dde5d23ac4bc5cd19c799ccR196. cc. @jgong5 if any comments on this.

Sep 22 '22 02:09 min-jean-cho

Thanks @msaroufim -- makes sense. I see most IPEX optimizations fit into (1). I see bf16 amp context manager torch.cpu.amp.autocast is necessary for (2) like here https://github.com/pytorch/serve/pull/1664/files#diff-d30e1f5ef9fd05e2ab9f652c116461632586cc132dde5d23ac4bc5cd19c799ccR196. cc. @jgong5 if any comments on this.

I agree to categorize the optimizations into two parts. The (2) should be much lighter to apply than (1). But I think something different w.r.t. IPEX optimizations. Most IPEX optimizations fit (2) except for int8 calibration which would take longer time and fit (1) more. More details:

For FP32 and BF16, the extra steps are ipex.optimize and JIT tracing which are light.
For INT8, the extra steps are calibration, quantization and JIT tracing. The "calibration" part is a bit heavy while "quantization" is more like ipex.optimize and light-weight. But we can do the calibration step offline and save the int8 recipe as a json file to be loaded at runtime.

With this, can we still keep FP32/BF16 optimization in (2) and move int8 calibration step to (1)? We may still leave an option for users to do calibration at runtime if they don't want to do an offline preparation step, for the sake of flexible usage.

A separate question: does TorchServe support a JIT scripted module as input? In that case, the IPEX runtime optimization steps can be avoided for all data types.

Sep 22 '22 03:09 jgong5

@min-jean-cho Thank you for the contribution. Could you please add documentation and test log in this PR?

Oct 04 '22 18:10 lxning

@min-jean-cho Thank you for the contribution. Could you please add documentation and test log in this PR?

Thanks @lxning , documentation refers to an update to https://github.com/pytorch/serve/blob/master/examples/intel_extension_for_pytorch/README.md and/or new documentations to how to add custom config & optimization ? And test log refers to torchserve logs ./logs ?

Oct 06 '22 05:10 min-jean-cho

closing for now, we can revisit this kind of refactor later

Apr 26 '23 03:04 msaroufim

serve
serve copied to clipboard

Modularize ipex optimization in `base_handler.py` into `ts/utils/ipex_optimization.py`.

serve serve copied to clipboard

Modularize ipex optimization in `base_handler.py` into `ts/utils/ipex_optimization.py`.

serve
serve copied to clipboard