Mikhail Scherbina comments

Results 49 comments of


                                            Mikhail Scherbina

MMPose segfault at the last feature

This somewhat coincides with the following line of code from the official codebase: https://github.com/open-mmlab/mmdeploy/blob/master/csrc/mmdeploy/codebase/mmpose/keypoints_from_heatmap.cpp#L179

Memory leak

testing it with the master branch

disable npp in multistream context

@cudawarped Should I be the one to do it? I.E. replace with the contexted version

disable npp in multistream context

I did a compile time static dispatch for either _Ctx or non-_Ctx methods. Tested, it performs not as good as opencv's because it uses cv::cuda::Mat::setTo to fill image with border...

disable npp in multistream context

You may not like me using concatenation macros, it can be avoided

NPP performance bottleneck with multiple streams

Trying same experiment with CUDA_LAUNCH_BLOCKING=1 yields the same result: npp is slow

NPP performance bottleneck with multiple streams

Here is the sort of performance I get with this simple kernel: ``` __global__ void flip_kernel(const cv::cuda::PtrStepSz input, cv::cuda::PtrStepSz output) { const int x = blockIdx.x * blockDim.x + threadIdx.x;...

NPP performance bottleneck with multiple streams

Looking at Nvidia Nsight Compute, npp version is quiet gappy and takes 0.9ms ![image](https://user-images.githubusercontent.com/42784580/186446097-dd47b858-d3e4-48c7-910b-7845ee417f88.png) whereas without npp its much more saturated and takes 0.15ms ![image](https://user-images.githubusercontent.com/42784580/186447339-2e8b4e5c-c729-4df2-930e-e9fc9309a2b5.png) Looking deeper into the issue...

NPP performance bottleneck with multiple streams

That's [an issue from 2016](https://stackoverflow.com/a/39204905), but I think, theoretically NPP should support multiple streams, as stated in the docs, but the reality is weird... Maybe there should be an option...

NPP performance bottleneck with multiple streams

> Note: Also new to NPP 10.1 is support for application managed stream contexts. Application managed stream contexts make NPP truely stateless internally allowing for rapid, no overhead, stream context...