`usleep()` to synchronize host/engine code does not work with simulator
It seems I can no longer synchronize between host and AI engine using timeouts (in the simulator). Let's say a core writes a value to memory. Even after large timeouts (e.g. 10 seconds), that value does not appear in the host read. I believe this used to work previously. (Using locks it works.)
I know we should use locks to synchronize instead of sleep, but to me this still seems indicative of a bug. Also, tutorial 1 still uses usleep() to synchronize host and AI engine code and currently breaks for me (simulation reports "FAIL!").
What's curious to me: Continuously polling until I see the expected value, instead of using a timeout, seems to work. Even more curious: It appears to always take exactly the same number of iterations (40 iterations for tutorial 1) before the value percolates to the host. If I add calls to usleep inside the polling loop, it makes everything slower, but it still always takes 40 iterations until the value that the core wrote can be seen in the host (i.e. the 40th read in the host always reflects the core write).
This leads me to think that, in the simulator, the AI core does not execute truly in parallel to the host?
Please let me know if I am doing something wrong. Maybe I need to flush AI core memory to the host somehow, but it seems weird that I do not have to do that when using locks/polling. Tutorial 1 also does no such thing.
tutorials/tutorial-1 is a minimal working example of this issue. Here is a diff that adds polling to that tutorial and circumvents the issue for me (also reports some timing information since I was trying to figure out what a good value for the usleep may be):
--- a/tutorials/tutorial-1/test.cpp
+++ b/tutorials/tutorial-1/test.cpp
@@ -19,6 +19,7 @@
#include <thread>
#include <unistd.h>
#include <xaiengine.h>
+#include <sys/time.h>
#include "aie_inc.cpp"
@@ -66,7 +67,17 @@ int main(int argc, char *argv[]) {
mlir_aie_start_cores(_xaie);
// Wait time for cores to run. Number used here is much larger than needed.
- usleep(100);
+ struct timeval tv_start, tv_current, tv_diff;
+ gettimeofday(&tv_start, NULL);
+ int32_t val = 0;
+ const int32_t expected_val = 14;
+ do {
+ val = mlir_aie_read_buffer_a14(_xaie, 3);
+ gettimeofday(&tv_current, NULL);
+ timersub(&tv_current, &tv_start, &tv_diff);
+ printf("[%2d.%06d] val = %d\n", tv_diff.tv_sec, tv_diff.tv_usec, val);
+ } while(val != expected_val);
+
// Check buffer at index 3 again for expected value of 14
printf("Checking buf[3] = 14.\n");
Any idea what changed? perhaps you moved to a new version of Vitis? It's somewhat interesting that the behavior changed, but it's not entirely unexpected that the test is fundamentally racy.
Thank you Stephen.
I did not update Vitis, I just rebuilt this repository. Which is weird because I feel like the bug would more likely be in libXAIE or the simulator itself. I can try to bisect this repository but I'm assuming this probably this is low priority.
I agree that usleep is "racy" but that's not what I feel like causes the bug for two reasons:
First, with an unreasonably large timeout (e.g. 10 s) I would expect the AI engine core to always win the race. If the core truly ran in parallel with the host, it really should complete a simple assignment to a buffer in less than 10 s (it does not wait for any locks or anything like that). Instead, the core seems to sleep too for those 10s!
Second, the behavior actually looks very deterministic when I do polling: The read shows up every time after 40 iterations.
Am I the only one seeing this problem with tutorial 1?