EasyCL
EasyCL copied to clipboard
adding fpga support
Would you please give me your comments on supporting OpenCL running on FPGA device instead of GPU such as Altera Arria 10? Thanks.
Typically FPGAs work slightly differently from discrete GPUs, in that the programming time is very long, hours.
For a discrete GPU, such as AMD, or NVIDIA, the workflow for OpenCL looks something like:
- program starts
- program initializes GPU device
- program loads OpenCL source-code, which looks like C code basically
- program gives OpenCL code to GPU driver, which compiles it to GPU object code, passes it to the GPU, and passes a handle back to the program
- program gives program handle to the GPU driver, along with some data, and GPU starts processing the data, following the logic in the program
For a discrete GPU, steps 1 to 3 take ~seconds. Step 4 takes ~seconds, or less. Step 5 takes as long as it takes. minutes/hours/days/weeks, depending on what you're doing/training.
For an FPGA, step 4, takes significantly longer. Hours instead of seconds. So, the workflow would be quite different. The compilation of the OpenCL has to happen offline essentially, rather than at runtime.
It's probably not a massively blocking change, but it would need rethinking somewhat how the program runs. For example, EasyCL currently assumes that hte OpenCL will be compiled at runtime. You'd need to partition EasyCL into two parts:
- offline compilation part, that you'd run once, during compilation, and would store the FPGA program somewhere. You'd then write the FPGA program to the FPGA I believe
- runtime bit, which would simply execute the program on the already-programmed FPGA
(But note that I have zero experience with FPGAs, so I dont really know. You should check how compilation on an FPGA works for yourself)
you are absolutely right. I have to compile kernel code (openCL) offline. It takes from 10 to 15 hours usually. I have been reading the source code and comparing with Altera OpenCL examples. as you said, I don't think there is a massive code change, but, indeed, I need to partition code.
Thanks.
On Sat, Dec 3, 2016 at 2:08 AM, Hugh Perkins [email protected] wrote:
(But note that I have zero experience with FPGAs, so I dont really know. You should check how compilation on an FPGA works for yourself)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hughperkins/EasyCL/issues/21#issuecomment-264630072, or mute the thread https://github.com/notifications/unsubscribe-auth/AXMfnuVACxbJsKMX6aXH5EL0t0Hgrvtuks5rET-dgaJpZM4LC7t3 .
Cool :-)
I am digging more on source code and try to build AlexNet. but I realize there is no stride (or stride = 1) in your implementation. is it true? for example, for AlexNet first layer, I have filter size = 11 x 11, feature map=96 (I guess in here, we can numFilters), it will have stride = 4, so we have 55 x 55 x 96 output. I don't know how we can do that in DeepCL?
Thanks.
On Mon, Dec 5, 2016 at 1:54 PM, Hugh Perkins [email protected] wrote:
Cool :-)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hughperkins/EasyCL/issues/21#issuecomment-264990017, or mute the thread https://github.com/notifications/unsubscribe-auth/AXMfngq3paYPhSYnZ74_r5bdD_XvIaWTks5rFIgWgaJpZM4LC7t3 .
Can you help me to understand the following code: in the LayerDimensions.cpp file and deriveOthers function
we have the following line (which puzzles me). this->outputSize = padZeros? (filterSize % 2 == 0? inputSize/(skip+1) + 1 : inputSize/(skip +1)) : (inputSize - filterSize)/ (skip+1) + 1;
I am wondering if (filterSize % 2 == 0? inputSize/(skip+1) + 1 : inputSize/(skip +1)) : could be: (filterSize % 2 == 0? (inputSize-filterSize)/(skip+1) + 1 : (inputSize-filterSize)/(skip +1)) :
It seems to me skip+1 = stride, is it right? There is not much explanation on dimension in the code. would you mind to spare a few minutes on this.
Appreciated.
-T
On Thu, Dec 8, 2016 at 4:16 PM, tzxu . [email protected] wrote:
I am digging more on source code and try to build AlexNet. but I realize there is no stride (or stride = 1) in your implementation. is it true? for example, for AlexNet first layer, I have filter size = 11 x 11, feature map=96 (I guess in here, we can numFilters), it will have stride = 4, so we have 55 x 55 x 96 output. I don't know how we can do that in DeepCL?
Thanks.
On Mon, Dec 5, 2016 at 1:54 PM, Hugh Perkins [email protected] wrote:
Cool :-)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/hughperkins/EasyCL/issues/21#issuecomment-264990017, or mute the thread https://github.com/notifications/unsubscribe-auth/AXMfngq3paYPhSYnZ74_r5bdD_XvIaWTks5rFIgWgaJpZM4LC7t3 .
Yes, skip + 1 is stride. 'Skip' is from a relatively old paper. 'Stride' is the common notation nowadays.
On 10 December 2016 01:28:39 CET, tonyzhenyuxu [email protected] wrote:
Can you help me to understand the following code: in the LayerDimensions.cpp file and deriveOthers function
we have the following line (which puzzles me). this->outputSize = padZeros? (filterSize % 2 == 0? inputSize/(skip+1) + 1 : inputSize/(skip +1)) : (inputSize - filterSize)/ (skip+1) + 1;
I am wondering if (filterSize % 2 == 0? inputSize/(skip+1) + 1 : inputSize/(skip +1)) : could be: (filterSize % 2 == 0? (inputSize-filterSize)/(skip+1) + 1 : (inputSize-filterSize)/(skip +1)) :
It seems to me skip+1 = stride, is it right? There is not much explanation on dimension in the code. would you mind to spare a few minutes on this.
Appreciated.
-T
On Thu, Dec 8, 2016 at 4:16 PM, tzxu . [email protected] wrote:
I am digging more on source code and try to build AlexNet. but I realize there is no stride (or stride = 1) in your implementation. is it true? for example, for AlexNet first layer, I have filter size = 11 x 11, feature map=96 (I guess in here, we can numFilters), it will have stride = 4, so we have 55 x 55 x 96 output. I don't know how we can do that in DeepCL?
Thanks.
On Mon, Dec 5, 2016 at 1:54 PM, Hugh Perkins [email protected] wrote:
Cool :-)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub
https://github.com/hughperkins/EasyCL/issues/21#issuecomment-264990017,
or mute the thread
.
-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/hughperkins/EasyCL/issues/21#issuecomment-266160565
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Hi, I also want to improve this code to support fpga.
I use tesrasic c5p, which chip is altera cyclone V
I could run some simple demo, such as hello world
, vector_add
, but when i call clinfo
, it reports
root@up2:~# clinfo
I: ICD loader reports no usable platforms
Is there some hope? Thanks very much
Fpgas need to be compiled offline, have the kernels burned onto the fpga. This can take several hours. Then, once they are burned in, you can run them.
You would need to modify your code to support two stages like this. And easycl too.
On Fri, Jun 15, 2018, 09:48 李昊 [email protected] wrote:
Hi, I also want to improve this code to support fpga. I use tesrasic c5p, which chip is altera cyclone V I could run some simple demo, such as hello world, vector_add, but when i call clinfo, it reports
root@up2:~# clinfo I: ICD loader reports no usable platforms
Is there some hope? Thanks very much
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hughperkins/EasyCL/issues/21#issuecomment-397625554, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHiqJ0iHEwD79Os75orj9lzxOoQY40Tks5t87sxgaJpZM4LC7t3 .
@lihao2333
oh. re-reading, now I have access to a web browser, not just replying to an email; ok, right, you would need to find an opencl driver, and icd registration, for your fpga. You could for example ask the customer support for your fpga, or search in their forums perhaps.