XRT
XRT copied to clipboard
Fix RC when accessing singleton's sessions vector
Problem solved by the commit
There are 2 threads xma_thread1
and xma_thread2
reading from a vector of sessions inside the g_xma_singleton
. The access was not guarded by any mutex while other threads could access the sessions vector and push a new element, this sometimes was causing a race condition where the thread was looping over elements and a new elements was being added causing a vector resize. This automatically invalidated any existing vector iterator (which the threads were using for reading from vector) which resulted in a segfault when trying to dereference it.
Bug / issue (if any) fixed, which PR introduced the bug, how it was discovered
The bug has been discovered when running a lot of quite short-lived sessions which caused the application to exit due to segfaults.
How problem was solved, alternative solutions (if any) and why they were rejected
Locking a mutex before accessing a list of sessions.
Risks (if any) associated with the changes in the commit
Probably none, the mutex could be replaced with std::shared_mutex
as there are probably more reads than writes but with that low frequency of locking it, it is probably not worth it.
What has been tested and how, request additional testing if necessary
Documentation impact (if any)
Can one of the admins verify this patch?
@vboggara-xilinx This is causing segfaults for us and blocking us from using Xilinx in prod. Do you happen to know when this may be merged and then released?
Thanks!
@dbenusov-xilinx I've moved the lock acquisition before the cpu_mode read as you wished. But when reading the comment of the XMA_CPU_MODE2
flag being #define XMA_CPU_MODE2 2 //High cpu load + high performance
I'm not sure if that is good idea to lock and then sleep for 3 milliseconds. Maybe it should lock, check the flag, unlock, sleep and lock afterwards?
That is a really good point I did not notice that, good catch. Try making the cpu_check an atomic variable and moving the lock_guard back to its original spot. Hopefully everything should work the same :)
retest this please.
Build Passed!
changes looks good to me. we can merge this change.
Can this finally be merged if all tests are passing?