Device tree uninitizlied after resized
I believe I found a bug in OpenSBI version 1.6 used in buildroot tag 2024.11.x with kernel 6.12.8. OpenSBI makes two calls to fdt_open_into increasing the length of the device tree by +32 and then +1024 bytes. However this new region of memory is not initialized. During the Linux kernel boot, the kernel performs a crc32 checks on the device tree. Since this region of memory is not initialized the crc32 will fail. I've confirmed this behavior in hardware using the CVW processor running two different FPGAs. To see the details of this debugging process visit https://github.com/openhwgroup/cvw/issues/1264.
Unfortunately I don't fully understand the code, but I did hack together a "poor" quality solution which just initializes the new memory region as a proof a concept. There is most certainly a more elegant solution.
This is the code I added after the variable declarations to fdt_open_into(const void *fdt, void *buf, int bufsize) of file opensbi/lib/utils/libftdi/fdt_rw.c
int index;
for(index = 0; index < bufsize; index++){
((char*) buf)[index] = 0;
}
This bug was also present in OpenSBI version 1.3.
Yes, it is true that OpenSBI uses fdt_open_into() to increase the FDT size without a later fdt_pack() or equivalent, so there can be uninitialized memory between fdt and fdt + fdt_totalsize(fdt).
However, that should be perfectly fine. The FDT itself does not contain any checksum. Linux just computes the crc32 of the FDT memory at two points during its boot process and compares them. This happens long after OpenSBI has finished modifying the FDT. So something during the Linux boot process (appears to) change the FDT. The only obvious ways I can see this happening is if fdt_totalsize(fdt) > 4 MiB, so the FDT overflows the fixmap space reserved for it (and the adjacent fixmap entry is not constant), or some bug is corrupting memory.
BTW your hack completely erases the existing FDT, since OpenSBI always calls fdt_open_into() with fdt == buf, so if Linux is using the FDT provided by OpenSBI, it shouldn't be able to boot.
If I am understanding fdt_open_into() correctly, buf is the destination array and fdt is the source. Zeroing the destination shouldn't erase the existing FDT. I've confirmed this hack does indeed boot CVW in physical hardware.
Going back to the Linux crc32 issue. What you are saying is the first pass creates the crc and the second is checks it. My failure is actually on the first pass when Linux creates the crc32 using uninitialized data. We are found this issue running the processor in a lockstep with the reference model ISA simulator ImperasDV. When CVW reads uninitialized data the values are random, but ImperasDV reads 0 so it detects a mismatch. I'm not sure if there is a way for the simulator to use CVW's values for uninitialized addresses.
Other than the consuming clock cycles, is there a downside to initializing the device tree?
If I am understanding
fdt_open_into()correctly, buf is the destination array and fdt is the source. Zeroing the destination shouldn't erase the existing FDT.
Your understanding is correct. However, OpenSBI passes the same pointer for buf and fdt, to attempt to expand the FDT in place. So zeroing the destination also zeros the existing FDT.
Going back to the Linux crc32 issue. What you are saying is the first pass creates the crc and the second is checks it. My failure is actually on the first pass when Linux creates the crc32 using uninitialized data. We are found this issue running the processor in a lockstep with the reference model ISA simulator ImperasDV. When CVW reads uninitialized data the values are random, but ImperasDV reads 0 so it detects a mismatch. I'm not sure if there is a way for the simulator to use CVW's values for uninitialized addresses.
Ah, so the issue is that Linux reads any uninitialized memory at all.
Other than the consuming clock cycles, is there a downside to initializing the device tree?
Just clock cycles and code size. A slightly better solution would be to shrink fdt_totalsize(fdt) to include only the size actually used (and thus initialized) by the FDT, to avoid wasting the rest of the buffer. Does this patch solve your issue?
--- a/platform/generic/platform.c
+++ b/platform/generic/platform.c
@@ -287,6 +287,8 @@ static int generic_final_init(bool cold_boot)
return rc;
}
+ fdt_pack(fdt);
+
return 0;
}
I must say I'm baffled how CVW could have booted with that hack then.
Your patch worked! Lockstep successfully made it through Linux's crc32 body function. We are now mismatching about 30 million instructions later so I think we can call this a success.
Thank you very much for helping.
Hold on a second. I reran without lockstep and we are stuck during Linux boot. I'll need to confirm what's happening to make sure this isn't hardware bug.
A slightly better solution would be to shrink
fdt_totalsize(fdt)to include only the size actually used (and thus initialized) by the FDT, to avoid wasting the rest of the buffer. Does this patch solve your issue?
Hello! I faced the same problem while trying to simulate the Linux boot process for the Cheshire SoC. Your patch resolved the issue. Maybe it's worth considering committing this change?
Sent as https://lore.kernel.org/opensbi/[email protected]/