Runyu Lu

Results 17 comments of Runyu Lu

@nihui 这里想另外请教nihui姐姐一些其他问题: * 关于argmax/argmin > [计划任务书](https://github.com/Tencent/OpenSourceTalent/issues/36)中介绍实现`argmax`, `argmin`layer实现并实现相关的pnnx转化。 本人查看当前ncnn仓库已经有了argmax的实现(个人认为代码比较简单,不过有点缩进问题以及omp可以简单优化),同时pnnx似乎也实现了`argmax`。个人疑问如下: 1. 有必要再重写一遍`argmin`吗? 2. 有必要在`x86`端做avx等指令集优化吗? 3. 有必要加`test_argmax.cpp`做验证吗? 4. nihui姐姐布置这个任务的本意是什么?感觉`argmax/argmin`这个任务好多部分ncnn都已经基本实现了? * 关于grid_sample 如果以上没有必要,我将会开始尝试`grid_sample`的开发与编写。

> missing avx/avx512 optimization for pack4 and avx512 optimization for pack8 ? If so, does the x86 part of batchnorm also need further optimization? @nihui

> > > missing avx/avx512 optimization for pack4 and avx512 optimization for pack8 ? > > > > > > If so, does the x86 part of batchnorm also need...

You could use the onnxsim to remove the ops like shape, etc.

> I've tried onnxsim, onnxoptimizer, and nvidia polygraphy with graphsurgeon, and none of them will get rid of these ops for whatever reason. All right. May be you could use...

> I've tried but cannot reproduce the error of `test_squeezenet`. Is there any way to run CTest with extra checks on? May be you can refer to this [line](https://github.com/Tencent/ncnn/blob/d30fc825d404a8c575cd200af802fe9e51e2f729/.github/workflows/linux-x64-cpu-gcc-san.yml#L37) about...

Please don't merge, still a little bugs to solve. But feel free to review because I've changed some api :)

Fix the bugs of dynamic grid for flash decoding, now it passed all the tests and could be merged :).

Unit Test: ![image](https://github.com/hpcaitech/ColossalAI/assets/77330637/98332a69-de68-481e-a2f0-5f38687a357f)