yolov5-net
yolov5-net copied to clipboard
Performance suggestion
This is a great project, but unfortunately performance is pretty awful compared to running the same models in Python. It takes so long to prepare the image and to parse the results that I don't think it really even matters if you run the model on the GPU or not.
I think one thing that would improve things quite a bit would be to witch from Parallel to standard for loops on lines 102 and 106 of YoloScorer.cs.
In my (limited) testing I saw a nearly 50% speed boost making that change alone. The tiny amount of work done inside those loops hardly justifies the overhead of parallelization. MAYBE leaving the parallel for on the outside might make sense, I don't know. Unfortunately I'm really not familiar with ML.Net or YOLO but I'll see if there are any other easy ways to improve things. Otherwise it is a great project, very easy to use and good results
I did some optimizations. You can see Yolov5Net-Faster.I use OpenCl and OpenCv to boost the computations.I'm a new hand on OpenCl and I'm not sure that the OpenCl code is the best way.(Sorry,My English is poor) Where is faster
I did some optimizations. You can see Yolov5Net-Faster.I use OpenCl and OpenCv to boost the computations.I'm a new hand on OpenCl and I'm not sure that the OpenCl code is the best way.(Sorry,My English is poor) Where is faster
高慧觉?感觉像个高僧的名字,亲你的项目很好,处理的很快,可我不知道怎么调整label的数量,能教教我么,我现在只能训练80label的.onnx用,没法添加lable
I did some optimizations. You can see Yolov5Net-Faster.I use OpenCl and OpenCv to boost the computations.I'm a new hand on OpenCl and I'm not sure that the OpenCl code is the best way.(Sorry,My English is poor) Where is faster
高慧觉?感觉像个高僧的名字,亲你的项目很好,处理的很快,可我不知道怎么调整label的数量,能教教我么,我现在只能训练80label的.onnx用,没法添加lable
我猜你可能是这里的问题
改了上面两个地方,没有用啊亲,还是报错,你有没有qq加个吧?378234608
I tried this out by changing the Parallel.for loops to regular for loops like this:
for (int y = 0; y < bitmapData.Height; y++) { byte* row = (byte*)bitmapData.Scan0 + (y * bitmapData.Stride); for(int x = 0; x < bitmapData.Width; x++) { tensor[0, 0, y, x] = row[x * bytesPerPixel + 2] / 255.0F; // r tensor[0, 1, y, x] = row[x * bytesPerPixel + 1] / 255.0F; // g tensor[0, 2, y, x] = row[x * bytesPerPixel + 0] / 255.0F; // b }
It is alot slower this way for me, were you running the release when you tested the speed or running thru vshost when you tested this? or perhaps you did things differently?
I work primarily on Linux, so it could have to do with different implementations of the .Net framework. I did run a test on a Windows PC and performance seemed much better, though I did not evaluate which method is best. Intuitively though, since there is a fair amount of overhead running Parallel.For, it would seem best to at least convert the inner one to a standard for. It seems unlikely that we will be seeing any CPUs in the near future that can support more than bitmapData.Height threads anyway