SparseInst icon indicating copy to clipboard operation
SparseInst copied to clipboard

Help wanted for quantized int8 model

Open Stephenzza opened this issue 3 years ago • 4 comments

Thanks for your fantastic job! I just quantized the model with INT8, and found that the performance has dropped seriously(map from 0.4 to 0.2). I found maybe the matmul operator caused this problem. Do you have any good suggestions?

Stephenzza avatar Oct 19 '22 08:10 Stephenzza

image I found after quantization the matmul operator max value is too big, which 0~255 can not represent all value.

Stephenzza avatar Oct 20 '22 06:10 Stephenzza

Awesome! Hi @Stephenzza, thanks for your interest in SparseInst! Could you specify this operation exactly?

wondervictor avatar Oct 27 '22 02:10 wondervictor

This operation is torch.bmm() in decoder.py line79 inst_features = torch.bmm(iam_prob, features.view(B, C, -1).permute(0, 2, 1)) assume input tensor shape (B,3,512,512), inst_features =(B,400,4096)*(B,4096,256), after this matrix multiplication i got the exploded min max value

Stephenzza avatar Oct 27 '22 06:10 Stephenzza

I see. I have some suggestions to avoid the overflow:

  1. normalize iam_prob first and then matmul.
normalizer = iam_prob.sum(-1).float().clamp(min=1e-4)
iam_prob = iam_prob / normalizer[:, :, None]
inst_features = torch.bmm(iam_prob, features.view(B, C, -1).permute(0, 2, 1))
  1. Using a large factor to avoid the overflow:
features = features / 1000
inst_features = torch.bmm(iam_prob, features.view(B, C, -1).permute(0, 2, 1))
inst_features = inst_features / normalizer[:, :, None]
inst_features = inst_features * 1000

I'm glad to hear about the progress and hope my suggestions will work.

wondervictor avatar Oct 28 '22 02:10 wondervictor