sparseml icon indicating copy to clipboard operation
sparseml copied to clipboard

[QuantizationModifier] pydantic classes for defining quantization schemes to generate QConfigs

Open bfineran opened this issue 3 years ago • 1 comments

a core feature of the QuantizationModifier refactor is the ability for users to have both more simple and more fine grained control over how quantization is applied at large and in small pieces of the model. To support this, this PR introduces two simple pydantic models that can be used to define how a particular layer's inputs, weights, and outputs are quantized

intended use: QuantizationArgs defines how quantization should be applied for a particular component via num_bits, symmetric and optional kwargs that map to pytorch quantization Observer args. QuantizationScheme builds on QuantizationArgs to allow for args to be set separately for inputs, weights, and outputs. If any of these values are set to None, that particular component of the layer will not be quantized.

helper methods are included to create Observer and QConfig objects from the pydantic models respectively.

these classes will be built on for parsing inputs to recipes - recipe users will not interface with these objects directly:

a post-refactor modifier YAML may look something like

!QuantizationModifier
  ...
  submodule_schemes:
    "bert.encoder":
        input_activations:
          num_bits: 8
          symmetric: False
        weights:
          num_bits: 8
          symmetric: True
        output_activations: None

test_plan unit tests included

bfineran avatar Sep 30 '22 20:09 bfineran

@anmarques @rahul-tuli assigned for review

github-actions[bot] avatar Sep 30 '22 20:09 github-actions[bot]