ncnn icon indicating copy to clipboard operation
ncnn copied to clipboard

[WIP] rnn/lstm/gru dynamic quantization

Open nihui opened this issue 1 year ago • 1 comments

  • [x] rnn
  • [x] rnn-arm
  • [x] lstm
  • [x] lstm-arm
  • [x] lstm-x86
  • [x] gru
  • [x] gru-arm
  • [x] fix over load s8
  • [ ] coverage
  • [ ] doc
  • [ ] speed test
  • [x] rnn aq
  • [x] rnn-arm aq
  • [x] lstm aq
  • [x] lstm-arm aq
  • [x] lstm-x86 aq
  • [x] gru aq
  • [x] gru-arm aq

nihui avatar Apr 18 '24 06:04 nihui

太赞啦!

csukuangfj avatar Apr 29 '24 13:04 csukuangfj

import torch
import torch.nn as nn
import torch.nn.functional as F
import pnnx

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()

        self.rnn = nn.RNN(input_size=256, hidden_size=256, num_layers=30)
        self.lstm = nn.LSTM(input_size=256, hidden_size=256, num_layers=30)
        self.gru = nn.GRU(input_size=256, hidden_size=256, num_layers=30)

    def forward(self, x):
        out0, _ = self.rnn(x)
        out1, _ = self.lstm(x)
        out2, _ = self.gru(x)
        return out0, out1, out2

net = Model().half().float()
net.eval()

torch.manual_seed(0)
x = torch.rand(300, 1, 256)

pnnx.export(net, "rnn.pt", x)
ncnn2int8 rnn.ncnn.param rnn.ncnn.bin rnn-int8.ncnn.param rnn-int8.ncnn.bin /dev/null
rnn/rnn-int8.bin fp16 int8
模型体积 60.1M 30.6M
qcom855plus MAE fp32 fp16 int8
30层rnn 0 2.29E-08 7.31E-08
30层lstm 0 4.39E-09 5.54E-09
30层gru 0 6.75E-09 1.96E-08
qcom855plus 单线程耗时 fp32 fp16 int8
30层rnn 45.16 24.81 19.87
30层lstm 256.51 121.99 60.7
30层gru 167.52 94.68 46.29
i5-12400 单线程跑30层lstm-int8模型耗时  
naive(sse2) 95.24
sse2 87.02
avx 64.85
avx2 42.22
avxvnni 23.24
avx512 27.95
avx512vnni 15.8

nihui avatar May 07 '24 06:05 nihui

imx6d 单线程耗时 fp32 int8
30层rnn 1392.22 504.83
30层lstm 6063.91 1833.46
30层gru 4357.59 1300.93

nihui avatar May 08 '24 07:05 nihui