[Quantization] layer-wise quantization
To keep the original precision of nodes, Glow provides keep-original-precision-for-nodes option. However, sometimes, we need to keep the precision of specific nodes rather than keeping all nodes of a kind. For example, it is well-known fact that the first and last layers are sensitive to accuracy drop.
To prevent quantization in a specific node, layer-wise quantization is necessary. I have two options to implement layer-wise quantization.
-
- implement layerwise quantization into
transformForPrecisionModepass.
- implement layerwise quantization into
-
- implement it into the partitioner.
- Available precision depends on backend hardware. Therefore, partitioning should support the option to specify the precision of operators.
- implement it into the partitioner.
Which one is the more plausible way? If anyone has other suggestions, feel free to leave the comment below.
Hi @leejaymin, I think this makes more sense to implement inside transformForPrecisionMode(). We provide the Backend we are optimizing for in that function anyway. Then you can query the Backend::isOpSupported() to determine if the precision makes sense for that backend.
I'm also wondering what you are thinking in terms of specifying what layers should have what precision. If it's as simple as always leaving first and last layers in the original precision then that should be relatively easy to implement generically. I had brainstormed about doing this on a per-layer basis a couple years ago in https://github.com/pytorch/glow/issues/2080. But it never went anywhere, and IIRC I didn't really love the possible solutions I came up with.