esp icon indicating copy to clipboard operation
esp copied to clipboard

Enable all possible leon3 bare-metal tests

Open zzhu35 opened this issue 5 years ago • 16 comments

The following multicore baremetal tests from GrLib were enabled and tested during the development of Spandex LLC at UIUC using a quad-core leon3 configuration. The enabled tests were also verified on an original ESP quad-core system. When calling base_test() from systest.c in the design folder, CPU 0 should print a report of all successfully executed tasks by each CPU.

zzhu35 avatar Apr 17 '20 03:04 zzhu35

Hi, thanks for submitting the pull request!

I ran the RTL simulation of the base_test() in the pull request with 4 Leon3 cores, but either the simulation got stuck or it needs to run for more than two days. When you verified it on the original ESP quad-core system, how long did your simulation take approximately? Did you test the app with the latest version of ESP?

If the runtime of the simulation is really that long I suggest to shorten it as much as possible (e.g. do not repeat multiple times the more time consuming tests). Additionally, it would be useful to add some prints done only by CPU0, to keep track of the state of the simulation. This can be done at the granularity of each test in leon3_test.c or at a finer granularity in some cases.

Thank you!

davide-giri avatar Apr 20 '20 04:04 davide-giri

Hi Davide, I did not run these tests in the simulator because I figured out that they take too long. Could you try running it on the FPGA and see if it actually gets stuck? It was working fine for me. Once we verified it's working on the FPGA we can reduce the numbers for the simulators.

Thank you!

zzhu35 avatar Apr 20 '20 16:04 zzhu35

I tested on a Xilinx VC707 with 4 CPU tiles, but it still gets stuck with or without the new prints you added. This is the behavior I observe. What's your setup for this test?

  • Without new prints
Start testing on 4 CPUs.
  • With new prints:
Start testing on 4 CPUs.
Finished multest.
Finished divtest.
Finished cache_fill with BYTE granularity.
Finished cache_fill with HALFWORD granularity.
Finished cache_fill with WORD granularity.

davide-giri avatar Apr 22 '20 08:04 davide-giri

Thank you.

I am using the same board as you are.

I do not recall which commit I ran these tests on. I'll re-do the test on the latest version and see what happens.

Is you FPGA test hanging every time you run it? Does it ever get finished?

zzhu35 avatar Apr 22 '20 15:04 zzhu35

It always hangs and according to the terminal output it's possible it's always getting stuck in the same place.

davide-giri avatar Apr 22 '20 16:04 davide-giri

Thank you for that info. I do not have an ESP implementation in hand and I'm compiling one as I type. However, I just ran the test with Spandex and it was working fine. Are you using ESP RTL cache or ESP SystemC cache?

zzhu35 avatar Apr 22 '20 16:04 zzhu35

I'm actually using the RTL cache, I can try with the SystemC cache and see if there are any differences.

davide-giri avatar Apr 22 '20 16:04 davide-giri

I remember in the past I tested it using an ESP SystemC cache. I have never tried using ESP RTL cache but I'll give it a try now to see if anything went wrong.

zzhu35 avatar Apr 22 '20 17:04 zzhu35

I just ran the tests with ESP RTL caches and got the same hanging behavior as yours. I'm now compiling a new design with SystemC caches.

zzhu35 avatar Apr 23 '20 00:04 zzhu35

I ran the test with SystemC caches and got the following output.

Start testing on 4 CPUs.
Finished multest.
Finished divtest.

I am afraid some other bugs are still present in the system. Just for a sanity check I will merge the HEAD of ESP into Spandex and see if it hangs or not.

zzhu35 avatar Apr 23 '20 17:04 zzhu35

Ok, so I won't merge this pull request because the tests do not work in ESP. Anyway this is useful to potentially find a bug in the system.

We'll take a look on our side to see if we find the problem. By the way are you sure about the position of if (!pid) data_structures_setup();? It seems to me it should be called earlier, before the other cores wake up.

davide-giri avatar Apr 23 '20 17:04 davide-giri

Yes, the data_structures_setup routine calls malloc for the buffers being used by later cache_fill tests.

I ran the tests with Spandex and it is working for me.

How many ways in the L2 cache does your configuration have? I just realized that the "ways" parameter passed to cache_fill shouldn't be hardcoded to 4. If your configuration has more or less than 4 ways, could you rerun the modified test?

Thank you!

zzhu35 avatar Apr 23 '20 20:04 zzhu35

We reproduced the issue and some debugging showed a potential corner-case not covered correctly in the case of two consecutive casa instructions on two different words of the same cache line.

We're working on a bug fix and we'll post here when done.

davide-giri avatar May 04 '20 19:05 davide-giri

Thank you for the update! Which level of cache is this bug occurring on? I'm concerned if it's on L2 then Spandex might also be affected.

zzhu35 avatar May 04 '20 19:05 zzhu35

At the moment it seems it should be in the private L2. If that's the case Spandex may have the same problem, which may not manifest itself because of different timing. We'll know more once we confirm and fix the bug.

davide-giri avatar May 04 '20 20:05 davide-giri

I see. Some of the ESP L2 states are unreachable in Spandex. That's also a potential reason why it was not being triggered.

zzhu35 avatar May 04 '20 20:05 zzhu35