Runyu Lu comments

Results 17 comments of


                                            Runyu Lu

Finish the heapsort of simplestl partial_sort

@nihui 这里想另外请教nihui姐姐一些其他问题： * 关于argmax/argmin > [计划任务书](https://github.com/Tencent/OpenSourceTalent/issues/36)中介绍实现`argmax`, `argmin`layer实现并实现相关的pnnx转化。本人查看当前ncnn仓库已经有了argmax的实现（个人认为代码比较简单，不过有点缩进问题以及omp可以简单优化），同时pnnx似乎也实现了`argmax`。个人疑问如下： 1. 有必要再重写一遍`argmin`吗？ 2. 有必要在`x86`端做avx等指令集优化吗？ 3. 有必要加`test_argmax.cpp`做验证吗？ 4. nihui姐姐布置这个任务的本意是什么？感觉`argmax/argmin`这个任务好多部分ncnn都已经基本实现了？ * 关于grid_sample 如果以上没有必要，我将会开始尝试`grid_sample`的开发与编写。

[InstanceNorm Optimize x86] AVX512/AVX/SSE intrinsic with elempack merged

> missing avx/avx512 optimization for pack4 and avx512 optimization for pack8 ? If so, does the x86 part of batchnorm also need further optimization? @nihui

[InstanceNorm Optimize x86] AVX512/AVX/SSE intrinsic with elempack merged

> > > missing avx/avx512 optimization for pack4 and avx512 optimization for pack8 ? > > > > > > If so, does the x86 part of batchnorm also need...

Add Shape and Gather ops

You could use the onnxsim to remove the ops like shape, etc.

Add Shape and Gather ops

> I've tried onnxsim, onnxoptimizer, and nvidia polygraphy with graphsurgeon, and none of them will get rid of these ops for whatever reason. All right. May be you could use...

Memory Pool Improvement For Variadic Sized Inputs

> I've tried but cannot reproduce the error of `test_squeezenet`. Is there any way to run CTest with extra checks on? May be you can refer to this [line](https://github.com/Tencent/ncnn/blob/d30fc825d404a8c575cd200af802fe9e51e2f729/.github/workflows/linux-x64-cpu-gcc-san.yml#L37) about...

Runyu Lu

Finish the heapsort of simplestl partial_sort

[InstanceNorm Optimize x86] AVX512/AVX/SSE intrinsic with elempack merged

[InstanceNorm Optimize x86] AVX512/AVX/SSE intrinsic with elempack merged

Add Shape and Gather ops

Add Shape and Gather ops

Memory Pool Improvement For Variadic Sized Inputs

[feat] cuda graph support and refactor non-functional api

[feat] cuda graph support and refactor non-functional api

[feat] cuda graph support and refactor non-functional api

[feat] cuda graph support and refactor non-functional api