shape_based_matching fusion time耗时异常

你好，根据你之前的调试建议，我在arm_linux环境上调整const int tileRows = 32;const int tileCols = 256;const int num_threads_ = 4;这几个参数；发现检测耗时，每过一会有fusion time耗时会异常，差别有60-100ms，一般会有10-30ms偏差.调了比较长时间，问题依然没有得到解决，请问还有其它地方可以优化吗？谢谢。

Mar 05 '21 13:03 wly2020-robot

是fusion阶段耗时异常还是matching阶段？

Mar 05 '21 13:03 meiqua

是fusion阶段的第一次耗时计算。

Mar 06 '21 00:03 wly2020-robot

想了下，有个地方很可能会产生波动：内存分配
可能因为这一块是多线程，不断地销毁再申请内存导致这种嵌入式性能的平台扛不住。可以试试先把一块固定的内存给好

Mar 06 '21 02:03 meiqua

可以通过c++的placement new实现

Mar 06 '21 02:03 meiqua

整体耗时可能也有点帮助

Mar 06 '21 02:03 meiqua

嗯，用placement new 替代buffer_0,buffer_1分配固定内存难以入手。

Mar 06 '21 02:03 wly2020-robot

placement new平时用的少，介绍是说在已经分配的内存上创建对象。

Mar 06 '21 02:03 wly2020-robot

需要改一些代码，有时间我看看

Mar 06 '21 02:03 meiqua

嗯，非常感谢，期待。

Mar 06 '21 02:03 wly2020-robot

简单改了下，试试这个fix_memo branch

Mar 07 '21 09:03 meiqua

融合进行了调试和测试，几乎没有多大变化，依然是第一次fusion time不稳定，有突变。

Mar 08 '21 08:03 wly2020-robot

下面是融合你今天给到我代码的fusion time和match time情况： new 下面是之前没改动代码的耗时情况： old

Mar 08 '21 09:03 wly2020-robot

调试参数设置：const int tileRows = 32; const int tileCols = 256; const int num_threads_ = 4;

Mar 08 '21 09:03 wly2020-robot

我在我的工作台式电脑上跑同样环境检测同种物料，耗时在50ms左右，比较稳定。电脑配置看下面截图：有安装NVIDA CUDA运行库寄相关驱动和开发工具。而且添加了openmp对算法耗时起到了比较大作用，在台式电脑上跑出算法总耗时在50ms左右的效果，我认为NVIDA CUDA运行库起到了比较重要的作用。如果算法在TX2上跑，是不是算法耗时会比较稳定，而且耗时会比较少？

Mar 08 '21 10:03 wly2020-robot

以下是在以上硬件配置上win10同等条件下跑出来的fusion time和match time，速度还是可以的; windows 10 detect

Mar 08 '21 11:03 wly2020-robot

没有在GPU上跑。这样的话需要更细致的profile看看，可以这样：

先把openmp关掉看一下，确认是不是openmp带来的问题
更细致地对fusion每个阶段计算耗时，尽量缩小范围看哪一部分跳动

Mar 08 '21 12:03 meiqua

有个描述需要更正一下；之前那个在windows跑的耗时效果是没有加入opemmp支持的。以下是加入opemmp支持的fusion time和match time情况，耗时减半；应该是跟opemmp有关的。

------------------ 原始邮件 ------------------ 发件人: "meiqua/shape_based_matching" <[email protected]>; 发送时间: 2021年3月8日(星期一) 晚上8:06 收件人: "meiqua/shape_based_matching"<[email protected]>; 抄送: "正在输入........"<[email protected]>;"Author"<[email protected]>; 主题: Re: [meiqua/shape_based_matching] fusion time耗时异常 (#136)

没有在GPU上跑。这样的话需要更细致的profile看看，可以这样：

先把openmp关掉看一下，确认是不是openmp带来的问题

更细致地对fusion每个阶段计算耗时，尽量缩小范围看哪一部分跳动

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Mar 08 '21 12:03 wly2020-robot

重新发送运行结果图片： opemmp

Mar 08 '21 12:03 wly2020-robot

在windows上加入opemmp跑出来的效果比较明显。在linux下加与不加差不多，是不是GPU没有调用起来？

Mar 08 '21 12:03 wly2020-robot

linux默认加上了，GPU本来就没用到

Mar 08 '21 12:03 meiqua

基本可以确定是match函数里面的process函数耗时异常。

Mar 08 '21 13:03 wly2020-robot

process函数里面有好几处关于_OPENMP的宏定义判断。应该是_OPENMP下的代码都没有跑起来。

Mar 08 '21 13:03 wly2020-robot

关掉openmp，时间会波动吗

Mar 08 '21 13:03 meiqua

process是主要运算函数，波动的话肯定是这个。可以在里面测试下各部分耗时

Mar 08 '21 13:03 meiqua

在我工作电脑上不会。开发板上arm-linux一样会。我用的是QT来下代码编译的，QT下怎么配置才能使_OPENMP下的代码跑起来？

Mar 08 '21 13:03 wly2020-robot

一般加-fopenmp会自动定义这个宏。波动的话，先不开openmp测一下时间，这样单线程比如容易确定是哪部分

Mar 08 '21 13:03 meiqua

嗯。谢谢

------------------ 原始邮件 ------------------ 发件人: "meiqua/shape_based_matching" <[email protected]>; 发送时间: 2021年3月8日(星期一) 晚上9:13 收件人: "meiqua/shape_based_matching"<[email protected]>; 抄送: "正在输入........"<[email protected]>;"Author"<[email protected]>; 主题: Re: [meiqua/shape_based_matching] fusion time耗时异常 (#136)

一般加-fopenmp会自动定义这个宏。波动的话，先不开openmp测一下时间，这样单线程比如容易确定是哪部分

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Mar 08 '21 13:03 wly2020-robot

你好，通过调试，基本可以确定是哪个代码段引发运行耗时不稳定。定位到的代码断： // update one by one for(int i=0; i<nodes_private.size(); i++) nodes_private[i]->update(); 第一次fusion time，在process函数中大for循环中循环16次，每次执行以上代码段耗时不一样，一般耗时在2-38ms波动；第二次fusion time，在process函数中大for循环中循环4次，每次执行以上代码段耗时比较稳定，每次执行耗时5ms，波动不大。

Mar 09 '21 06:03 wly2020-robot

以上调试是关闭了openmp了的。

Mar 09 '21 06:03 wly2020-robot

主要运算就是这个update；fix memo branch新加入了计时的代码，可以试试update里哪一步波动最大

Mar 09 '21 12:03 meiqua

shape_based_matching shape_based_matching copied to clipboard

fusion time耗时异常

shape_based_matching
shape_based_matching copied to clipboard