XRT icon indicating copy to clipboard operation
XRT copied to clipboard

CR-1128916 Download xclbin failed for pre-emptible kernel

Open rbramand-xilinx opened this issue 2 years ago • 11 comments

Problem solved by the commit

While downling xclbin in pre-emptible kernel copy_from_user fails to copy buffers with size greater than 4K bytes, fixed it by copying 4K chunks at a time when preemption is enabled

Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered

Fixed copying xclbin sections from user pointer failures

How problem was solved, alternative solutions (if any) and why they were rejected

copying 4K chunk buffers at a time solved this issue

Risks (if any) associated the changes in the commit

none as the change is added under if preemption condition

What has been tested and how, request additional testing if necessary

tested yolov4 application with preemption enabled and the test passes

Documentation impact (if any)

NA

rbramand-xilinx avatar Jun 13 '22 13:06 rbramand-xilinx

Build Failed! :(

gbuildx avatar Jun 13 '22 13:06 gbuildx

This is just a workaround till we find the real root cause, right?

maxzhen avatar Jun 13 '22 16:06 maxzhen

retest this please

salindac avatar Jun 13 '22 19:06 salindac

Hi @rbramand-xilinx , Can you explain why copy_from_user() doesn't work with CONFIG_PREEMPT when size is large than 4K? I didn't get any helpful information from internet and Linux source code. Please share your knowledge.

mamin506 avatar Jun 13 '22 20:06 mamin506

Build Passed!

gbuildx avatar Jun 13 '22 20:06 gbuildx

Hi @mamin506 , @maxzhen, @houlz0507 copy_from_user inside zocl_read_sect function fails for all sections that have size greater than 4K(PAGE_SIZE) bytes when Preemption is enabled (AIE_RESOURCES and AIE_METADATA in this case). Also I came across some google pages where they are talking about copy_from_user and 4k (page size) buffers eg: https://www.spinics.net/lists/newbies/msg00058.html . So I made some experiments like copying 4k chunks at a time and copy_from_user passes so I made this change. There are not many articles about copy_from_user behavior when preemption is enabled. Also user application uses threads and reading xclbin sections fails only when xclbin is programmed from application not from command line. Please let me know your thoughts on this.

rbramand-xilinx avatar Jun 14 '22 14:06 rbramand-xilinx

Hi @mamin506 , @maxzhen, @houlz0507 copy_from_user inside zocl_read_sect function fails for all sections that have size greater than 4K(PAGE_SIZE) bytes when Preemption is enabled (AIE_RESOURCES and AIE_METADATA in this case). Also I came across some google pages where they are talking about copy_from_user and 4k (page size) buffers eg: https://www.spinics.net/lists/newbies/msg00058.html . So I made some experiments like copying 4k chunks at a time and copy_from_user passes so I made this change. There are not many articles about copy_from_user behavior when preemption is enabled. Also user application uses threads and reading xclbin sections fails only when xclbin is programmed from application not from command line. Please let me know your thoughts on this.

Can you please elaborate "copy_from_user inside zocl_read_sect function fails for all sections that have size greater than 4K(PAGE_SIZE) bytes"? What failure are we talking about? I understand that if the user buffer crosses 4K boundary, copy_from_user may sleep. But it does not necessarily means we can not pass user buffer that cross page boundary. Just make sure we are not under atomic context.

larry9523 avatar Jun 14 '22 15:06 larry9523

Hi @mamin506 , @maxzhen, @houlz0507 copy_from_user inside zocl_read_sect function fails for all sections that have size greater than 4K(PAGE_SIZE) bytes when Preemption is enabled (AIE_RESOURCES and AIE_METADATA in this case). Also I came across some google pages where they are talking about copy_from_user and 4k (page size) buffers eg: https://www.spinics.net/lists/newbies/msg00058.html . So I made some experiments like copying 4k chunks at a time and copy_from_user passes so I made this change. There are not many articles about copy_from_user behavior when preemption is enabled. Also user application uses threads and reading xclbin sections fails only when xclbin is programmed from application not from command line. Please let me know your thoughts on this.

Can you please elaborate "copy_from_user inside zocl_read_sect function fails for all sections that have size greater than 4K(PAGE_SIZE) bytes"? What failure are we talking about? I understand that if the user buffer crosses 4K boundary, copy_from_user may sleep. But it does not necessarily means we can not pass user buffer that cross page boundary. Just make sure we are not under atomic context.

copy_from_user is returning number of bytes it is failing to copy and this happens only when preemption is enabled. It is failing only for those sections whose size is greater than 4KB, so I have added to copy 4K chunks at a time. Also we are not under atomic context while doing the copying part.

rbramand-xilinx avatar Jun 14 '22 16:06 rbramand-xilinx

Hi @mamin506 , @maxzhen, @houlz0507 copy_from_user inside zocl_read_sect function fails for all sections that have size greater than 4K(PAGE_SIZE) bytes when Preemption is enabled (AIE_RESOURCES and AIE_METADATA in this case). Also I came across some google pages where they are talking about copy_from_user and 4k (page size) buffers eg: https://www.spinics.net/lists/newbies/msg00058.html . So I made some experiments like copying 4k chunks at a time and copy_from_user passes so I made this change. There are not many articles about copy_from_user behavior when preemption is enabled. Also user application uses threads and reading xclbin sections fails only when xclbin is programmed from application not from command line. Please let me know your thoughts on this.

Can you please elaborate "copy_from_user inside zocl_read_sect function fails for all sections that have size greater than 4K(PAGE_SIZE) bytes"? What failure are we talking about? I understand that if the user buffer crosses 4K boundary, copy_from_user may sleep. But it does not necessarily means we can not pass user buffer that cross page boundary. Just make sure we are not under atomic context.

copy_from_user is returning number of bytes it is failing to copy and this happens only when preemption is enabled. It is failing only for those sections whose size is greater than 4KB, so I have added to copy 4K chunks at a time. Also we are not under atomic context while doing the copying part.

We'd better find out why it fails before we write some code to workaround it.

larry9523 avatar Jun 14 '22 16:06 larry9523

sure @larry9523 I will debug further on why it fails and get back.

rbramand-xilinx avatar Jun 14 '22 16:06 rbramand-xilinx

Marking this PR as do-not-merge until it is root caused.

maxzhen avatar Jun 14 '22 16:06 maxzhen

Please raise a PR after figuring out reason behind this issue. Closing this PR

chvamshi-xilinx avatar Sep 24 '22 05:09 chvamshi-xilinx