embeddedsw icon indicating copy to clipboard operation
embeddedsw copied to clipboard

Xil_DCacheInvalidateRange does not flush aligned ranges shorter than a cacheline anymore

Open jmetter opened this issue 1 year ago • 3 comments

The fix in commit https://github.com/Xilinx/embeddedsw/commit/0976d41408dfc4f4cd2be174269fe1ef61176792 changed the behavior of Xil_DCacheInvalidateRange in a way that may or may not have been your intention.

Previously, the cortexr5 function performed a flush (clean and invalidate):

  • on the first cacheline of the range, if the start of the non-empty range is not cacheline aligned
  • on the last cacheline of the range, if the end of the non-empty range is not cacheline aligned

After the fix, this property still holds, except for the special case where the range starts cacheline aligned and is shorter than one cacheline. In that case it now just invalidates the single cacheline without cleaning it first. The changed behavior results from the if clause that presumably was added to avoid duplicated flushes of the same cacheline.

Our code on cortexr5 happened to unintentionally rely on this cacheline clean (that is only documented for cortexa53) and it was not easy to locate the problem after our update of the embeddedsw sources.

The situation is similar but not identical for the cortexa53 version, as that implementation increments opstartaddr by one cacheline if it flushed that cacheline.

jmetter avatar Oct 10 '23 17:10 jmetter

Thanks @jmetter for reporting, we will check and fix it in our next release as needed.

sivadur avatar Oct 11 '23 05:10 sivadur

@jmetter , if I understand correctly, your use case is: In the same cache line you have some DMA address (of small length, much less than 32 bytes) and also some other variables .. So, you request for a InvalidateRange for that small length, but in the process, you loose updates that happened recently to the global vars.. If so, yes, it is a miss. We need to update our cache tests properly to catch for such issues. Thanks again for pointing this out and sorry for the trouble it might have caused you for the debug..

Also, for A53, is it 32 bit or 64 bit your use case is? For 64 bit mode, it is not an issue. Inherently, for A53 64 bit mode, each IVAC (invalidation) is treated as CIVAC .. Which means, underlying a clean and invalidation always happen if the cache line is dirty. For A53 32 bit mode, we will cross-check and let you know.

anirudha1977 avatar Oct 12 '23 04:10 anirudha1977

@anirudha1977 Thank you for looking into this. Your assumption about the use case is correct. There is a variable length DMA transfer starting at a cache aligned address and the Cortex-R5 writes data directly after the DMA buffer. In some cases, the DMA ended up shorter than a single cacheline and in these cases the data written by the CPU was lost.

I already fixed the issue on my side: that call to Xil_DCacheInvalidateRange was badly placed, because I was unaware of the CPU written data on the same cacheline. It just worked because of the flushing in Xil_DCacheInvalidateRange, so nobody noticed.

Sorry for confusing you with the Cortex A53. I mixed up the folders. Both ARMv8/32bit an ARMv8/64bit look fine and as you pointed out, the 64bit version uses CIVAC anyways. The file I wanted to mention was the Cortex A9, as an equivalent if clause was added to that Xil_DCacheInvalidateRange implementation as well and I think that resulted in the same change in behavior. However, I do not develop for the Zynq 7000/ Cortex A9, so it does not affect me.

jmetter avatar Oct 16 '23 13:10 jmetter