MNN GRU模型CPU后端多batch推理性能无提升

测试了gru模型单batch与512batch模型在inter cpu后端的推理耗时，512batch耗时基本是512倍的单batch耗时，修改nthread=1，4，32推理耗时没有变化，cpu利用率为100%，400%， 3100%，请问是否符合预期？

模型为onnx导出到mnn模型，单batch模型输入shape为[1,1,47]+[2,1,64]，512batch输入shape为[512,1,47]+[2,512,64]

Dec 26 '23 09:12 yklidelong

能不能把模型发我们看一下，邮箱[email protected]，打包rar

Dec 27 '23 02:12 v0jiuqi

另外你先用ModuleBasic.out这个工具看一下哪个算子耗时最大

Dec 27 '23 02:12 v0jiuqi

网络权限限制，测试使用的是pytorch构造的最简单GRU模型结构，仅有nn.GRU+nn.Sigmoid+nn.Linear三层，torch.export导出到onnx后转MNN

Dec 27 '23 08:12 yklidelong

我刚刚测试了一下，Batch=512的耗时不是Batch=1的512倍呀

Dec 27 '23 11:12 v0jiuqi

请问是导出batch512的onnx模型后转MNN模型直接runsession，还是导出batch1的onnx模型转MNN后对inputTensor进行resize再runsession？

Dec 28 '23 03:12 yklidelong

使用MNNV2Basic.out工具测试单线程与32线程性能： ./MNNV2Basic.out gru_b512.mnn 50 0 0 0 1 Use extra forward type: 0 Open Model gru_b512.mnn Load Cache file error. The device support i8sdot:0, support fp16:0, support i8mm: 0 test_main, 282, cost time: 0.810000 ms Session Info: memory use 1.055801 MB, flops is 0.3687006 M, backendType is 13 Input size:65536 Session Resize Done. Session Start running... Tensor shape: 2, 512, 64, fileName.str().c_str()=s ./input 0.txt in _loadInpoutFromFile, 110 output: h output: output precision:2, memory: 0, Run 50 time: Avg= 20.475580 ms, Opsum = 21.660240 ms_min= 20.360001ms, max= 21.699001 ms

./MNNV2Basic.out gru_b512.mnn 50 0 0 32 Use extra forward type: 0 Open Model gru_b512.mnn Load Cache file error. The device support i8sdot:0, support fp16:0, support i8mm: 0 test_main, 282, cost time: 1.670000 ms Session Info: memory use 1.055801 MB, flops is 0.368706M, backendType is 13 Session Resize Done. Session Start running... Tensor shape: 2, 512, 64, Input size:65536 fileName.str().c_str()=s ./input 0.txt in _loadInputFromFile, 110 output: h output: output precision:2, memory: 0, Run 50 time: Avg= 25.187241 ms, OpSum = 26.607084ms min= 25.158001 ms, max= 25.363001 ms

Dec 28 '23 06:12 yklidelong

Marking as stale. No activity in 60 days.

Feb 26 '24 09:02 github-actions[bot]