Enable all possible leon3 bare-metal tests
The following multicore baremetal tests from GrLib were enabled and tested during the development of Spandex LLC at UIUC using a quad-core leon3 configuration. The enabled tests were also verified on an original ESP quad-core system. When calling base_test() from systest.c in the design folder, CPU 0 should print a report of all successfully executed tasks by each CPU.
Hi, thanks for submitting the pull request!
I ran the RTL simulation of the base_test() in the pull request with 4 Leon3 cores, but either the simulation got stuck or it needs to run for more than two days. When you verified it on the original ESP quad-core system, how long did your simulation take approximately? Did you test the app with the latest version of ESP?
If the runtime of the simulation is really that long I suggest to shorten it as much as possible (e.g. do not repeat multiple times the more time consuming tests). Additionally, it would be useful to add some prints done only by CPU0, to keep track of the state of the simulation. This can be done at the granularity of each test in leon3_test.c or at a finer granularity in some cases.
Thank you!
Hi Davide, I did not run these tests in the simulator because I figured out that they take too long. Could you try running it on the FPGA and see if it actually gets stuck? It was working fine for me. Once we verified it's working on the FPGA we can reduce the numbers for the simulators.
Thank you!
I tested on a Xilinx VC707 with 4 CPU tiles, but it still gets stuck with or without the new prints you added. This is the behavior I observe. What's your setup for this test?
- Without new prints
Start testing on 4 CPUs.
- With new prints:
Start testing on 4 CPUs.
Finished multest.
Finished divtest.
Finished cache_fill with BYTE granularity.
Finished cache_fill with HALFWORD granularity.
Finished cache_fill with WORD granularity.
Thank you.
I am using the same board as you are.
I do not recall which commit I ran these tests on. I'll re-do the test on the latest version and see what happens.
Is you FPGA test hanging every time you run it? Does it ever get finished?
It always hangs and according to the terminal output it's possible it's always getting stuck in the same place.
Thank you for that info. I do not have an ESP implementation in hand and I'm compiling one as I type. However, I just ran the test with Spandex and it was working fine. Are you using ESP RTL cache or ESP SystemC cache?
I'm actually using the RTL cache, I can try with the SystemC cache and see if there are any differences.
I remember in the past I tested it using an ESP SystemC cache. I have never tried using ESP RTL cache but I'll give it a try now to see if anything went wrong.
I just ran the tests with ESP RTL caches and got the same hanging behavior as yours. I'm now compiling a new design with SystemC caches.
I ran the test with SystemC caches and got the following output.
Start testing on 4 CPUs.
Finished multest.
Finished divtest.
I am afraid some other bugs are still present in the system. Just for a sanity check I will merge the HEAD of ESP into Spandex and see if it hangs or not.
Ok, so I won't merge this pull request because the tests do not work in ESP. Anyway this is useful to potentially find a bug in the system.
We'll take a look on our side to see if we find the problem. By the way are you sure about the position of if (!pid) data_structures_setup();? It seems to me it should be called earlier, before the other cores wake up.
Yes, the data_structures_setup routine calls malloc for the buffers being used by later cache_fill tests.
I ran the tests with Spandex and it is working for me.
How many ways in the L2 cache does your configuration have? I just realized that the "ways" parameter passed to cache_fill shouldn't be hardcoded to 4. If your configuration has more or less than 4 ways, could you rerun the modified test?
Thank you!
We reproduced the issue and some debugging showed a potential corner-case not covered correctly in the case of two consecutive casa instructions on two different words of the same cache line.
We're working on a bug fix and we'll post here when done.
Thank you for the update! Which level of cache is this bug occurring on? I'm concerned if it's on L2 then Spandex might also be affected.
At the moment it seems it should be in the private L2. If that's the case Spandex may have the same problem, which may not manifest itself because of different timing. We'll know more once we confirm and fix the bug.
I see. Some of the ESP L2 states are unreachable in Spandex. That's also a potential reason why it was not being triggered.