ONE icon indicating copy to clipboard operation
ONE copied to clipboard

[onert] Quantization type kernel for transformer

Open hseok-oh opened this issue 9 months ago • 1 comments

Below is required I/O quantization type (uint8/uint16) kernel for quantized transformer model

  • [ ] MUL
    • [x] UINT8
    • [ ] INT16
  • [ ] ADD
    • [x] UINT8
    • [ ] INT16
  • [ ] RSQRT
    • [ ] UINT8
    • [ ] INT16
  • [ ] DIV
    • [ ] UINT8
    • [ ] INT16
  • [x] RESHAPE (same I/O quant param)
  • [ ] TRANSPOSE (same I/O quant param)
    • [x] UINT8
    • [ ] INT16
  • [ ] STRIDED_SLICE (same I/O quant param)
    • [ ] UINT8
    • [ ] INT16
  • [ ] NEG
    • [ ] UINT8
    • [ ] INT16
  • [ ] CONCATENATION
    • [x] UINT8
    • [ ] INT16
  • [ ] BATCH_MATMUL
    • [ ] UINT8
    • [ ] INT16
  • [ ] SOFTMAX
    • [x] UINT8
    • [ ] INT16
  • [ ] LOGISTIC
    • [x] UINT8
    • [ ] INT16
  • [ ] GATHER (indices: int32/int64)
    • [x] UINT8
    • [ ] INT16
  • [ ] MEAN
    • [x] UINT8
    • [ ] INT16
  • [ ] SQRT
    • [ ] UINT8
    • [ ] INT16

Quantization type change

  • [ ] QUANTIZE
    • [ ] UINT8 -> INT16
    • [ ] INT16 -> UINT8

I/O and weight quantization type for transformer model

  • [ ] FULLY_CONNECTED (channelwise quantization)
    • [ ] UINT4 weight, UINT8 I/O (#12741)
    • [ ] UINT8 I/O and weight
    • [ ] INT16 I/O and weight

hseok-oh avatar Apr 30 '24 06:04 hseok-oh