libdnn icon indicating copy to clipboard operation
libdnn copied to clipboard

integration to tinny-cnn

Open edgarriba opened this issue 8 years ago • 100 comments

@naibaf7 @bhack

I open this ticket in order to discuss ideas for integration libdnn to tiny-cnn.

Currently, I implemented a small interface to get native OpenCL context from tiny-cnn: https://github.com/edgarriba/tiny-cnn/blob/f4d9e1d4f45ad8ac46824b5c007f5507fe8925eb/tiny_cnn/core/session.h

Things I think that are needed:

  • Implement a module for data transfer between devices
  • Discuss the shape of libdnn simplified interface if is needed.

BTW, @naibaf7 @hughperkins notice that we are planing to migrate tiny-cnn to an organization account and renaming the library itself since now it's more a pure DNN lib than just CNN. Maybe you are interested in getting more involved in the development. https://github.com/nyanp/tiny-cnn/issues/235

edgarriba avatar Jul 18 '16 10:07 edgarriba

@edgarriba Ok, I'll look into it. What is your idea about the data transfer and simplified interface?

naibaf7 avatar Jul 18 '16 11:07 naibaf7

@naibaf7 for memory syncing I was thinking in an approach similar to TF. What do you think it's the best to start? https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/common_runtime/copy_tensor.cc#L46

Regarding simplified interface, just to make clear what type of data will need libdnn interfaces.

edgarriba avatar Jul 18 '16 15:07 edgarriba

@naibaf7 What do you think of this design?

bhack avatar Jul 18 '16 20:07 bhack

@bhack Not bad, but essentially no different from functions already existing in either ViennaCL or Caffe code. It's really unfortunate that tiny-cnn can't handle any GPU related management itself at the moment, as both cuDNN and libDNN expect a proper GPU & memory initialization beforehand (and I oriented the libDNN interface closely to the cuDNN one).

naibaf7 avatar Jul 18 '16 20:07 naibaf7

What do you want to be our responsabilities? Device, memory and context? Kernel launch?

bhack avatar Jul 18 '16 21:07 bhack

@bhack If you want compatibility with typical GPU libraries, then optimally device, memory and context. Kernel launch is handled enclosed in both cuDNN and libDNN, because this needs tricky parameter selection.

But if you can wait 2 days, I will update libDNN with simplifications for the device, memory and context part (basically copied over from Caffe). Basically just wrappers for memory allocation, device listing & initialization & memory transfer around CUDA and ViennaCL.

naibaf7 avatar Jul 18 '16 21:07 naibaf7

We have no problem to handle device, memory and context. If you think that would be useful to have this features here ok. If not we will implement this in tiny. We need to think at the largest audience for this library. So if you think that generally callers benefit for handling context, device and memories itself for other kind of operations it is ok for us to implement in tiny and go ahead with other kernel coverage porting here.

bhack avatar Jul 18 '16 21:07 bhack

@naibaf7 I leave here the error report I get: http://pastebin.com/yv6rmu21

edgarriba avatar Jul 27 '16 09:07 edgarriba

@edgarriba What steps do you use to test the integration (just so I can replicate)...? The error looks familiar from early OpenCL Caffe tests, so it should be something I can fix.

naibaf7 avatar Jul 27 '16 09:07 naibaf7

with this branch: https://github.com/edgarriba/tiny-cnn/tree/libdnn

  cmake -DBUILD_TESTS=ON -DUSE_OPENCL=ON -DUSE_LIBDNN=ON -DUSE_TBB=ON ..
  make && ./test/tiny_cnn_test

important routines: https://github.com/edgarriba/tiny-cnn/blob/libdnn/tiny_cnn/core/backend_dnn.h#L76 https://github.com/edgarriba/tiny-cnn/blob/libdnn/tiny_cnn/core/kernels/libdnn_conv2d_kernel.h#L54

edgarriba avatar Jul 27 '16 09:07 edgarriba

@edgarriba Ok I gained a good understanding of what you are trying to do and what issues the code has. I filed a pull request on your tinycnn/libdnn branch which fixes some of the issues and explains some details of the problems that still exist.

If you need more assistance in fixing these, or want to discuss it in more details, we can schedule a Skype conversation with screen sharing and revise the code that way.

naibaf7 avatar Jul 28 '16 04:07 naibaf7

By the way, random observation, not sure if this is useful or not, but if you use the clew abstraction layer, then you can build for opencl, without libOpenCL.so being present, and you can even run your program, without libOpenCL.so being present. Your program can make a runtime-decision about whether to try binding with opencl or not. Actually, I think it can even attempt to call clewInit(), and thus detect if libOpenCL.so is present, and so on, and deal accordingly. (not sure about this last sentence, would be easy to fix this if it's not actually the case)

hughperkins avatar Aug 03 '16 14:08 hughperkins

@hughperkins Intriguing! Does that also work with CUDA? Because it would be really useful if there could be a universal binary that has no fixed dependencies beyond C++11 :) What about multi-location search for libOpenCL.so and nvrtc.so/cuda.so? Can it figure out these different locations?

naibaf7 avatar Aug 03 '16 16:08 naibaf7

Does that also work with CUDA?

No

Because it would be really useful if there could be a universal binary that has no fixed dependencies beyond C++11 :)

:-) Well, I geuss someone could create something analagous for cuda. Maybe... you? :-)

What about multi-location search for libOpenCL.so and nvrtc.so/cuda.so? Can it figure out these different locations?

For libOpenCL.so, it searches in a couple of hard-coded locations:

https://github.com/hughperkins/clew/blob/master/src/clew.c#L180-L184

    module = CLEW_DYNLIB_OPEN("OpenCL.dll");
    if(module == 0) module = CLEW_DYNLIB_OPEN("/Library/Frameworks/OpenCL.framework/OpenCL");
    if(module == 0) module = CLEW_DYNLIB_OPEN("libOpenCL.so");
    if(module == 0) module = CLEW_DYNLIB_OPEN("libOpenCL.so.1");
    if(module == 0) module = CLEW_DYNLIB_OPEN("/usr/lib/libOpenCL.so");
    if(module == 0) module = CLEW_DYNLIB_OPEN("/usr/lib/libOpenCL.so.1");

You can see that there is no reason why this couldnt be extended arbitrarily and/or read from some config file. But this covers almost all, or perhaps all?, cases I've seen, I think?

hughperkins avatar Aug 04 '16 02:08 hughperkins

@hughperkins Yeah, I see... yes maybe you could add possible locations for SDKs such as AMDs APP SDK libOpenCL.so and nVidias libOpenCL.so within CUDA :) This also works fine with the OpenCL ICD loader I assume?

naibaf7 avatar Aug 04 '16 12:08 naibaf7

https://github.com/CudaWrangler/cuew

bhack avatar Aug 04 '16 21:08 bhack

Wow, you are just a mine of useful information bhack. Its amazing. Are you are AI? :-P

hughperkins avatar Aug 05 '16 00:08 hughperkins

@bhack @hughperkins More importantly, why does bhack know that much and take all the time to respond? :) amazing... but yeah oh well, cuew and clew are two more items to integrate into libDNN then... endless list of work :)

naibaf7 avatar Aug 05 '16 00:08 naibaf7

This also works fine with the OpenCL ICD loader I assume?

yes. the sequence is:

clew => loads libopencl.so => reads icd => loads vendor drivers => loads device information etc

hughperkins avatar Aug 05 '16 00:08 hughperkins

@edgarriba Can you post the UML rendered image of the integration proposal?

bhack avatar Aug 05 '16 06:08 bhack

I don't think that Caffe it is the right testbed for libdnn cause libdnn was in some sense designed around Caffe. So if you want to give some feedback on Edgar design...

bhack avatar Aug 05 '16 06:08 bhack

df8ae6e567cb85ea62e7d20a8081f89a http://uml.mvnsearch.org/gist/df8ae6e567cb85ea62e7d20a8081f89a

edgarriba avatar Aug 05 '16 07:08 edgarriba

feedback is welcomed!

edgarriba avatar Aug 05 '16 07:08 edgarriba

Another interesting roadmap to monitor is https://phabricator.pmoreau.org/w/mesa/opencl_through_spirv_current_status/

bhack avatar Aug 05 '16 11:08 bhack

@CNugteren I think that you could be interested in the last messages.

bhack avatar Aug 05 '16 20:08 bhack

As you can see we have started to use CLCudaAPI for Tiny but also integrating libdnn for convolutional kernels. @CNugteren told that he was interested to contribute in libdnn. By this GSoC integration experiment I see a little bit of duplication of features. Libdnn actually maintain its tuning class and Cedric has CLtune. Libdnn uses ViennaCL as helper class for Opencl/cuda and Cedric has CLCudaAPI. Libdnn use some algebra from Vienna and Cedric from CLBlast. Is there a little bit of efforts duplication?

bhack avatar Aug 06 '16 09:08 bhack

@bhack Yes there certainly is a little bit of duplication, but I don't think that's an issue. Inside Caffe I even use the duplication as an advantage and offer clBLAS, CLBlast and ViennaCL for the linear algebra part. It's just that especially with the algebra, different libraries perform good or bad on different devices.

naibaf7 avatar Aug 06 '16 11:08 naibaf7

We cannot share the tuning component? Also, I think CLCudaAPI it is neutral enougth to use Vienna and CLBast algebra.

bhack avatar Aug 06 '16 12:08 bhack

@bhack No the way the code generation works in libDNN is too different from the CLtune approach I think. Yes, when you use CLCudaAPI you can also make use of multiple BLAS libraries, of course.

naibaf7 avatar Aug 06 '16 12:08 naibaf7

I think naibaf7 is right: libDNN generates its OpenCL kernels from C++ code, whereas CLTune assumes that you already wrote a complete and functioning OpenCL kernel with tuning parameters exposed through the pre-processor (#define and #if - #else - #endif). Re-writing libDNN in this way is of course possible, but if the tuning is already working as it is now, why bother?

Using CLCudaAPI is orthogonal to using CLTune of course. It is just a way to make using OpenCL from C++ a lot more friendlier with the added bonus of making porting to CUDA as easy as changing a header.

CNugteren avatar Aug 06 '16 12:08 CNugteren