EasyCL icon indicating copy to clipboard operation
EasyCL copied to clipboard

adding fpga support

Open tonyzhenyuxu opened this issue 8 years ago • 10 comments

Would you please give me your comments on supporting OpenCL running on FPGA device instead of GPU such as Altera Arria 10? Thanks.

tonyzhenyuxu avatar Dec 02 '16 20:12 tonyzhenyuxu

Typically FPGAs work slightly differently from discrete GPUs, in that the programming time is very long, hours.

For a discrete GPU, such as AMD, or NVIDIA, the workflow for OpenCL looks something like:

  1. program starts
  2. program initializes GPU device
  3. program loads OpenCL source-code, which looks like C code basically
  4. program gives OpenCL code to GPU driver, which compiles it to GPU object code, passes it to the GPU, and passes a handle back to the program
  5. program gives program handle to the GPU driver, along with some data, and GPU starts processing the data, following the logic in the program

For a discrete GPU, steps 1 to 3 take ~seconds. Step 4 takes ~seconds, or less. Step 5 takes as long as it takes. minutes/hours/days/weeks, depending on what you're doing/training.

For an FPGA, step 4, takes significantly longer. Hours instead of seconds. So, the workflow would be quite different. The compilation of the OpenCL has to happen offline essentially, rather than at runtime.

It's probably not a massively blocking change, but it would need rethinking somewhat how the program runs. For example, EasyCL currently assumes that hte OpenCL will be compiled at runtime. You'd need to partition EasyCL into two parts:

  • offline compilation part, that you'd run once, during compilation, and would store the FPGA program somewhere. You'd then write the FPGA program to the FPGA I believe
  • runtime bit, which would simply execute the program on the already-programmed FPGA

hughperkins avatar Dec 02 '16 23:12 hughperkins

(But note that I have zero experience with FPGAs, so I dont really know. You should check how compilation on an FPGA works for yourself)

hughperkins avatar Dec 03 '16 10:12 hughperkins

you are absolutely right. I have to compile kernel code (openCL) offline. It takes from 10 to 15 hours usually. I have been reading the source code and comparing with Altera OpenCL examples. as you said, I don't think there is a massive code change, but, indeed, I need to partition code.

Thanks.

On Sat, Dec 3, 2016 at 2:08 AM, Hugh Perkins [email protected] wrote:

(But note that I have zero experience with FPGAs, so I dont really know. You should check how compilation on an FPGA works for yourself)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hughperkins/EasyCL/issues/21#issuecomment-264630072, or mute the thread https://github.com/notifications/unsubscribe-auth/AXMfnuVACxbJsKMX6aXH5EL0t0Hgrvtuks5rET-dgaJpZM4LC7t3 .

tonyzhenyuxu avatar Dec 05 '16 18:12 tonyzhenyuxu

Cool :-)

hughperkins avatar Dec 05 '16 21:12 hughperkins

I am digging more on source code and try to build AlexNet. but I realize there is no stride (or stride = 1) in your implementation. is it true? for example, for AlexNet first layer, I have filter size = 11 x 11, feature map=96 (I guess in here, we can numFilters), it will have stride = 4, so we have 55 x 55 x 96 output. I don't know how we can do that in DeepCL?

Thanks.

On Mon, Dec 5, 2016 at 1:54 PM, Hugh Perkins [email protected] wrote:

Cool :-)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hughperkins/EasyCL/issues/21#issuecomment-264990017, or mute the thread https://github.com/notifications/unsubscribe-auth/AXMfngq3paYPhSYnZ74_r5bdD_XvIaWTks5rFIgWgaJpZM4LC7t3 .

tonyzhenyuxu avatar Dec 09 '16 00:12 tonyzhenyuxu

Can you help me to understand the following code: in the LayerDimensions.cpp file and deriveOthers function

we have the following line (which puzzles me). this->outputSize = padZeros? (filterSize % 2 == 0? inputSize/(skip+1) + 1 : inputSize/(skip +1)) : (inputSize - filterSize)/ (skip+1) + 1;

I am wondering if (filterSize % 2 == 0? inputSize/(skip+1) + 1 : inputSize/(skip +1)) : could be: (filterSize % 2 == 0? (inputSize-filterSize)/(skip+1) + 1 : (inputSize-filterSize)/(skip +1)) :

It seems to me skip+1 = stride, is it right? There is not much explanation on dimension in the code. would you mind to spare a few minutes on this.

Appreciated.

-T

On Thu, Dec 8, 2016 at 4:16 PM, tzxu . [email protected] wrote:

I am digging more on source code and try to build AlexNet. but I realize there is no stride (or stride = 1) in your implementation. is it true? for example, for AlexNet first layer, I have filter size = 11 x 11, feature map=96 (I guess in here, we can numFilters), it will have stride = 4, so we have 55 x 55 x 96 output. I don't know how we can do that in DeepCL?

Thanks.

On Mon, Dec 5, 2016 at 1:54 PM, Hugh Perkins [email protected] wrote:

Cool :-)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hughperkins/EasyCL/issues/21#issuecomment-264990017, or mute the thread https://github.com/notifications/unsubscribe-auth/AXMfngq3paYPhSYnZ74_r5bdD_XvIaWTks5rFIgWgaJpZM4LC7t3 .

tonyzhenyuxu avatar Dec 10 '16 00:12 tonyzhenyuxu

Yes, skip + 1 is stride. 'Skip' is from a relatively old paper. 'Stride' is the common notation nowadays.

On 10 December 2016 01:28:39 CET, tonyzhenyuxu [email protected] wrote:

Can you help me to understand the following code: in the LayerDimensions.cpp file and deriveOthers function

we have the following line (which puzzles me). this->outputSize = padZeros? (filterSize % 2 == 0? inputSize/(skip+1) + 1 : inputSize/(skip +1)) : (inputSize - filterSize)/ (skip+1) + 1;

I am wondering if (filterSize % 2 == 0? inputSize/(skip+1) + 1 : inputSize/(skip +1)) : could be: (filterSize % 2 == 0? (inputSize-filterSize)/(skip+1) + 1 : (inputSize-filterSize)/(skip +1)) :

It seems to me skip+1 = stride, is it right? There is not much explanation on dimension in the code. would you mind to spare a few minutes on this.

Appreciated.

-T

On Thu, Dec 8, 2016 at 4:16 PM, tzxu . [email protected] wrote:

I am digging more on source code and try to build AlexNet. but I realize there is no stride (or stride = 1) in your implementation. is it true? for example, for AlexNet first layer, I have filter size = 11 x 11, feature map=96 (I guess in here, we can numFilters), it will have stride = 4, so we have 55 x 55 x 96 output. I don't know how we can do that in DeepCL?

Thanks.

On Mon, Dec 5, 2016 at 1:54 PM, Hugh Perkins [email protected] wrote:

Cool :-)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub

https://github.com/hughperkins/EasyCL/issues/21#issuecomment-264990017,

or mute the thread

https://github.com/notifications/unsubscribe-auth/AXMfngq3paYPhSYnZ74_r5bdD_XvIaWTks5rFIgWgaJpZM4LC7t3

.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/EasyCL/issues/21#issuecomment-266160565

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

hughperkins avatar Dec 10 '16 04:12 hughperkins

Hi, I also want to improve this code to support fpga.
I use tesrasic c5p, which chip is altera cyclone V
I could run some simple demo, such as hello world, vector_add, but when i call clinfo, it reports

root@up2:~# clinfo
I: ICD loader reports no usable platforms

Is there some hope? Thanks very much

lihao2333 avatar Jun 15 '18 13:06 lihao2333

Fpgas need to be compiled offline, have the kernels burned onto the fpga. This can take several hours. Then, once they are burned in, you can run them.

You would need to modify your code to support two stages like this. And easycl too.

On Fri, Jun 15, 2018, 09:48 李昊 [email protected] wrote:

Hi, I also want to improve this code to support fpga. I use tesrasic c5p, which chip is altera cyclone V I could run some simple demo, such as hello world, vector_add, but when i call clinfo, it reports

root@up2:~# clinfo I: ICD loader reports no usable platforms

Is there some hope? Thanks very much

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hughperkins/EasyCL/issues/21#issuecomment-397625554, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHiqJ0iHEwD79Os75orj9lzxOoQY40Tks5t87sxgaJpZM4LC7t3 .

hughperkins avatar Jun 15 '18 21:06 hughperkins

@lihao2333

oh. re-reading, now I have access to a web browser, not just replying to an email; ok, right, you would need to find an opencl driver, and icd registration, for your fpga. You could for example ask the customer support for your fpga, or search in their forums perhaps.

hughperkins avatar Jun 16 '18 19:06 hughperkins