GRU模型CPU后端多batch推理性能无提升
测试了gru模型单batch与512batch模型在inter cpu后端的推理耗时,512batch耗时基本是512倍的单batch耗时,修改nthread=1,4,32推理耗时没有变化,cpu利用率为100%,400%, 3100%,请问是否符合预期?
模型为onnx导出到mnn模型,单batch模型输入shape为[1,1,47]+[2,1,64],512batch输入shape为[512,1,47]+[2,512,64]
能不能把模型发我们看一下,邮箱[email protected],打包rar
另外你先用ModuleBasic.out这个工具看一下哪个算子耗时最大
网络权限限制,测试使用的是pytorch构造的最简单GRU模型结构,仅有nn.GRU+nn.Sigmoid+nn.Linear三层,torch.export导出到onnx后转MNN
我刚刚测试了一下,Batch=512的耗时不是Batch=1的512倍呀
请问是导出batch512的onnx模型后转MNN模型直接runsession, 还是导出batch1的onnx模型转MNN后对inputTensor进行resize再runsession?
使用MNNV2Basic.out工具测试单线程与32线程性能: ./MNNV2Basic.out gru_b512.mnn 50 0 0 0 1 Use extra forward type: 0 Open Model gru_b512.mnn Load Cache file error. The device support i8sdot:0, support fp16:0, support i8mm: 0 test_main, 282, cost time: 0.810000 ms Session Info: memory use 1.055801 MB, flops is 0.3687006 M, backendType is 13 Input size:65536 Session Resize Done. Session Start running... Tensor shape: 2, 512, 64, fileName.str().c_str()=s ./input 0.txt in _loadInpoutFromFile, 110 output: h output: output precision:2, memory: 0, Run 50 time: Avg= 20.475580 ms, Opsum = 21.660240 ms_min= 20.360001ms, max= 21.699001 ms
./MNNV2Basic.out gru_b512.mnn 50 0 0 32 Use extra forward type: 0 Open Model gru_b512.mnn Load Cache file error. The device support i8sdot:0, support fp16:0, support i8mm: 0 test_main, 282, cost time: 1.670000 ms Session Info: memory use 1.055801 MB, flops is 0.368706M, backendType is 13 Session Resize Done. Session Start running... Tensor shape: 2, 512, 64, Input size:65536 fileName.str().c_str()=s ./input 0.txt in _loadInputFromFile, 110 output: h output: output precision:2, memory: 0, Run 50 time: Avg= 25.187241 ms, OpSum = 26.607084ms min= 25.158001 ms, max= 25.363001 ms
Marking as stale. No activity in 60 days.