Fabian Tschopp
Fabian Tschopp
@zhenghuitian I suggest you read those two excellent articles first: - https://github.com/clMathLibraries/clBLAS/wiki/AutoGemm - http://www.cedricnugteren.nl/tutorial.php Other than that, you mainly have to find out the required FLOPS per global memory read/write...
@campbx It has OpenCL 1.1 support. It is not fully optimized yet, but it's faster than any im2col/col2im (explicit GEMM) implementation.
@gstoner Thanks for the update. I'll be looking into it. There is certainly an interest on making LibDNN also compatible on new platforms. ViennaCL is no strict requirement, the CUDA...
@bhack @gstoner Just as a heads-up I'm currently writing Pooling kernels for LibDNN. I found that they can also have a performance benefit (in the 10%-20% range for AlexNet total...
@bhack I add kernels that have a significant performance effect or are required for work in my other projects as I go along, it might violate the dogma ;)
Cool, will test next week :)
@romix - Yes, it's planned as soon as I'm done with my work on the Caffe branch, which includes finishing Quantization for LibDNN. There are a few issues with dependencies...
Yes exactly, the autotuner was more of a proof-of-concept thus far. I'm also working with lower-end devices now (Rasperry Pi VC4CL and Mali T740) to check how the autotuner can...
It could be possible to add different layouts as an option. I haven't had time to look into it so far. Performance wise, on ImageNet, my Vega Frontier Edition using...
How do you allocate the memory for filter, input and output? You can also check how much % of your GPU's peak performance you can utilize. Radeon Pro 555 should...