XRT
XRT copied to clipboard
Workaround dead-lock in hardware-emulation shutdown
Problem solved by the commit
Workaround dead-lock in hardware-emulation shutdown
Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered
Just running some simple SYCL code on xsjsycl41 server.
How problem was solved, alternative solutions (if any) and why they were rejected
This is solved by an outrageous hack.
The user needs to define XRT_PCIE_HW_EMU_FORCE_SHUTDOWN
environment variable before running an XRT program.
Risks (if any) associated the changes in the commit
This is probably not to be merged but to be used as a branch by other people hitting this problem while there is a real fix.
What has been tested and how, request additional testing if necessary
This works for us™. :-)
Documentation impact (if any)
Can one of the admins verify this patch?
ok to test
Build Failed! :(
retest this please
Build Failed! :(
retest this please - infra errors
Build Passed!
Build Passed!
Rebased on master
just before https://github.com/Xilinx/XRT/pull/6297 to avoid a test not working with PS kernel.
Build Failed! :(
retest this please.
Build Passed!
While I understand the desire here, I think this is a hack around an underlying problem. @akasat @venkatp-xilinx shouldn't you be addressing the hang itself?
Yes Soren, we have a CR for this, and we will handle it as part of the CR fix.
Added hemantk-xilinx as reviewer. Basic issue is in the hw_emu model which is being worked upon the hw emulation team. The solution seems very hacky, we do not want to merge the change.
The solution seems very hacky, we do not want to merge the change.
I totally agree, as suggested in the PR description. ;-)
@keryell , Can we close this PR as this is open from long and no updates.
Is there any alternative? Is this problem solved otherwise?
Is there any alternative? Is this problem solved otherwise?
I am not sure whether this is solved or not. @venkatp-xilinx , Can you please take a look and confirm. @keryell , Can you please file a CR and inform @venkatp-xilinx .
Is there any alternative? Is this problem solved otherwise?
This is resolved now, hw_emu model corrected the double free memory corruption issue and with that we would not see this issue any more. As a general fix in driver, we are trying various ways to identity the sock aliveness (when device process abruptly crashes) and do a clear exit from the driver. So we actively working on it and will provide proper solution in master branch shortly.
Build Failed! :(
@keryell Per @venkatp-xilinx comment this PR is no longer needed?
Build Passed!
I believe this has been addressed to some degree :-)
I believe this has been addressed to some degree :-)
Our CI might be wrong then. :-( I have not seen the rewrite of all the XRT-emulation layer either... :-(