OpenHBMC
OpenHBMC copied to clipboard
Failures on long runs.
This is now very bad issue. Sorry folks. After switching from BUFG mode BUFR/BUFIO mode we did see it working well, but just in case we let the loop test to run. Well first time it did run about a week until it failed. We are now running it all the time, every day we check if it has failed. And we are seeing failures almost every day. So there is a likelihood that the OpenHBMC fails within 24 hours of continuous testing.
This is bad, it actually means that OpenHBMC can not be used in real products. As once a day failure can not be tolerated. If it works, it should work and not fail every other day.
We are sure that our target hardware is near ideal for HyperRAM testing - all hyperbus signals are LESS than 4 mm long! This is amazing layout the HyperRAM sits below FPGA and the wires are really all in the range of 2..4mm! It cant be better than this. So it is for sure not signal integrity issue.
Argh! I recall that we have reports from another HyperRAM IP vendor that some HyperRAM devices itself have failures, like real memory content losses. This would mean that the error is really in the HyperRAM device itself. But how to verify it? Right now we just run the memory tests in forever loop, the error is only telling us the memory width at the failing test, we don't see what data was read or written. And its really not helpful for debugging.
As of now we do not also know if the problem is still related to the Xilinx FIFO and be essentially the same failure as with BUFG versions just happening not fast.
We would be really happy to assist in the debugging of this problem, let us know if we can try out something to rule out some possible causes. I myself have little ideas what we could try.
- change serdes clock from 300 to 301 MHz?
- change IO slew rate to fast?
One interesting option to test the IP would be using CRUVI loopback adapter, but for this testing we would need a HyperRAM emulation model, I am guessing the model offered by Cypress would not work well in FPGA :(
Anyway we are happy to assist with this issue. We really would like to see that HyperRAM would work, and well more than 24 hours!
UPDATE: there is a 3 year old forum entry about errors that happen every 10..20 hours, with different hardware and different IP Core. https://forum.trenz-electronic.de/index.php/topic,1320.0.html
So there are chances that there is something bad with the HyperRAM chip itself. ?
UPDATE2: different HyperRAM chip, different hardware and different IP Core, and also data corruption: https://community.infineon.com/t5/Hyper-RAM/HyperRAM-Memory-Corruption/td-p/281115
It would be really nice to see WHAT type of errors happen here at our testing...