Zdahap

Results 2 issues of Zdahap

# 平台(如果交叉编译请再附上交叉编译目标平台): orin-CPU # Platform(Include target platform as well if cross-compiling): orin-cpu(支持sdot指令) # 复现概率:较高 # Github Version: MNN tag 2.9.1 # 编译方式: # Compiling Method cmake -DMNN_SUPPORT_TRANSFORMER_FUSE=ON -DMNN_LOW_MEMORY=ON -DMNN_BUILD_LLM=ON .....

question

如果按照原先方式: std::unique_ptr src{new uint8_t[NR_BYTES]{}}; std::unique_ptr dst{new uint8_t[NR_BYTES]{}}; 此时得到的带宽偏低,在8295上可能才14 GBps 如果改为: uint8_t* src = (uint8_t*)malloc(NR_BYTES * sizeof(uint8_t)); uint8_t* dst = (uint8_t*)malloc(NR_BYTES * sizeof(uint8_t)); 得到的带宽接近36 GBps。 测试设备 SA8295大核。