ONE
ONE copied to clipboard
[onert] Quantization type kernel for transformer
Below is required I/O quantization type (uint8/uint16) kernel for quantized transformer model
- [ ] MUL
- [x] UINT8
- [ ] INT16
- [ ] ADD
- [x] UINT8
- [ ] INT16
- [ ] RSQRT
- [ ] UINT8
- [ ] INT16
- [ ] DIV
- [ ] UINT8
- [ ] INT16
- [x] RESHAPE (same I/O quant param)
- [ ] TRANSPOSE (same I/O quant param)
- [x] UINT8
- [ ] INT16
- [ ] STRIDED_SLICE (same I/O quant param)
- [ ] UINT8
- [ ] INT16
- [ ] NEG
- [ ] UINT8
- [ ] INT16
- [ ] CONCATENATION
- [x] UINT8
- [ ] INT16
- [ ] BATCH_MATMUL
- [ ] UINT8
- [ ] INT16
- [ ] SOFTMAX
- [x] UINT8
- [ ] INT16
- [ ] LOGISTIC
- [x] UINT8
- [ ] INT16
- [ ] GATHER (indices: int32/int64)
- [x] UINT8
- [ ] INT16
- [ ] MEAN
- [x] UINT8
- [ ] INT16
- [ ] SQRT
- [ ] UINT8
- [ ] INT16
Quantization type change
- [ ] QUANTIZE
- [ ] UINT8 -> INT16
- [ ] INT16 -> UINT8
I/O and weight quantization type for transformer model
- [ ] FULLY_CONNECTED (channelwise quantization)
- [ ] UINT4 weight, UINT8 I/O (#12741)
- [ ] UINT8 I/O and weight
- [ ] INT16 I/O and weight