pynq.Overlay() always uses cached .hwh information
After migrating to PYNQ 3.0.0, I faced problems when updating existing overlays. Basically, the changes in the .hwh file do not propagate to pynq.Overlay(), which always uses the cached .hwh information.
To reproduce this error:
- Create a simple FPGA design & generate overlay
- Upload overlay to FPGA
- Load overlay within Jupyter Notebook via ol = pyqn.Overlay("bitstream.bit")
- Inspect ol.ip_dict
- Update the FPGA design (change instance names, addressing...) & generate overlay
- Idem to 2
- Idem to 3
- idem to 4. The displayed information has not changed!
Hi @dspsandbox,
Would you be able to share the .hwh and .bit files you were using for this test? It would be very helpful for recreating the issue.
In the mean time you can call the following between step 5 and 6 to clear and cached data:
from pynq import PL
PL.reset()
Thanks, Shane
Hi Shane, thanks you so much for the quick reply, your solutions works :)
Here are the two versions of the same overlay I was using. They include a screenshot of the design, where you can see that the only difference is the instance name of an axi gpio.
BR, Pau
Hi @dspsandbox,
I am interested in understanding under what circumstances would the hwh change but not the bitstream? Or are you saying that the bitstream and hwh did change?
Hi @mariodruiz, I am always updating/changing the whole overlay (i.e .hwh and .bit together). The problem is that, after migrating to PYNQ 3.0.0, any update in the design is not visible within the Jupyter notebook. In other words, pynq.Overlay( ) always returns the IP handles within the first version of my overlay. My assumption was that this is a .hwh cache problem...
Hi. I came here by googling the same behaviour that @dspsandbox reported. This weekend I encountered some weird situations that basically consisted in 1) generating hwh/bit from my Vivado design, 2) loading the corresponding overlay from pynq, 3) then changing some bits in my Vivado design (e.g. changing one axi_dma parameter, or replacing the axi_dma by a axi_vdma) and generating new hwh/bit, 4) loading the new overlay and 5) finding out that the "new overlay" behaved as the old one (e.g. complains of axi_dma transfer length too long, though I had increased the value or not detecting any vdma but a dma when I had just replaced the dma by the vdma). I just realised that restarting and reloading the overlay seems to solve the issue. I will try out doing "PL.reset()" (as rebooting each time I need to load a new overlay takes its time).
Regards
I get the same behavior using PYNQ release 3.0.1 on ZCU208. For several weeks I mistook the bug for an issue with the HWH parser not recognizing some custom IP, but then I made a change to an overlay to delete some existing IP and it still showed up in Overlay.ip_dict despite not being present at all in the HWH file. A power cycle fixed it. I'll try the PL.reset() workaround ASAP, but it'd be nice to squash this bug at its root.
I had a similar problem after I renamed the .bit and .hwh files. After a quick glance, it seems to me like it caches the path to the bitstream file, but when the file/filename changes it doesn't invalidate the cache. Right now I don't have enough free time to check it out, but I could have a look during the weekend and maybe do a PR?
This has been causing occasional problems for users and developers of the QICK project.
observed behavior
We did not dive into this in any rigorous way, but our observations were consistent with the others in this thread (and we summarized them in https://github.com/openquantumhardware/qick/wiki/clearing-cached-PYNQ-metadata):
- in some cases, if you load one bitstream (call this A) and then another (call this B), the cached metadata from the bitstream A will be kept
- this seems to happen most often when the A and B bitstreams were loaded from the same file path (in other words, A's .bit+.hwh files were overwritten with B's)
- we are not sure if this is a necessary or sufficient condition - i.e. we have the impression that this bug can occur in other situations, and we are not totally sure that the bug will reproduce 100% of the time under this situation
- loading another bitstream (C) and then B again seemed to fix the problem; clearing the PL server state with
pynq.pl_server.global_state.clear_global_state()seems to reliably fix the problem
reproducing the bug
I did some digging today.
Here's what seems to be a reliable repro.
- ZCU216 with a PYNQ 3.0.1 SD card image
- running in a Jupyter notebook
- .bit and .hwh are from https://s3df.slac.stanford.edu/people/meeg/qick/tprocv2/2025-05-28_216_tprocv2r23_8fullspeed/ and https://s3df.slac.stanford.edu/people/meeg/qick/tprocv2/2025-09-13_216_tprocv2r27_8fullspeed/
code (load the two bitstreams and print the TIMESTAMP field in the .hwh header):
import pynq
!cp /data/fw/2025-05-28_216_tprocv2r23_8fullspeed/qick_216.* .
soc = pynq.Overlay("qick_216.bit")
print("metadata after copying+loading 2025-05-28 bitfile:", soc.device.parser.systemgraph._root.get("TIMESTAMP"))
!cp /data/fw/2025-09-13_216_tprocv2r27_8fullspeed/qick_216.* .
soc = pynq.Overlay("qick_216.bit")
print("metadata after copying+loading 2025-09-13 bitfile:", soc.device.parser.systemgraph._root.get("TIMESTAMP"))
print("\nactual timestamps:")
!head -n2 qick_216.hwh \
/data/fw/2025-05-28_216_tprocv2r23_8fullspeed/qick_216.hwh \
/data/fw/2025-09-13_216_tprocv2r27_8fullspeed/qick_216.hwh
output (where in the second line I should see a timestamp of September 13, but I get the old value of May 28):
metadata after copying+loading 2025-05-28 bitfile: Wed May 28 13:38:12 2025
metadata after copying+loading 2025-09-13 bitfile: Wed May 28 13:38:12 2025
actual timestamps:
==> qick_216.hwh <==
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<EDKSYSTEM EDWVERSION="1.2" TIMESTAMP="Sat Sep 13 11:40:05 2025" VIVADOVERSION="2023.1">
==> /data/fw/2025-05-28_216_tprocv2r23_8fullspeed/qick_216.hwh <==
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<EDKSYSTEM EDWVERSION="1.2" TIMESTAMP="Wed May 28 13:38:12 2025" VIVADOVERSION="2023.1">
==> /data/fw/2025-09-13_216_tprocv2r27_8fullspeed/qick_216.hwh <==
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<EDKSYSTEM EDWVERSION="1.2" TIMESTAMP="Sat Sep 13 11:40:05 2025" VIVADOVERSION="2023.1">
possible cause
BitstreamHandler.get_parser() decides whether to use cache or read the HWH file by checking the hash in the cache against hash of the new .bit file:
https://github.com/Xilinx/PYNQ/blob/19ed17d4fd4ea71f95a97d30f8ccd0d62b8dbaaa/pynq/pl_server/embedded_device.py#L194
But the "hash in the cache" is GlobalState.bitfile_hash and this is a freshly computed hash of the .bit file at the cached location:
https://github.com/Xilinx/PYNQ/blob/19ed17d4fd4ea71f95a97d30f8ccd0d62b8dbaaa/pynq/pl_server/global_state.py#L49
The result is that if you copy a new .bit into the old location, the hash comparison will fail to invalidate the cached metadata.
Demonstrating this a little: after running the repro code above, the bitfile_hash in pynq.pl_server.global_state.load_global_state() differs from the hash in the JSON state file.
code:
print("actual SHA1 hashes:")
!sha1sum qick_216.bit \
/data/fw/2025-05-28_216_tprocv2r23_8fullspeed/qick_216.bit \
/data/fw/2025-09-13_216_tprocv2r27_8fullspeed/qick_216.bit
print("\ncontents of PL state file:")
!cat /usr/local/share/pynq-venv/lib/python3.10/site-packages/pynq/pl_server/global_pl_state.json
print("\n\nload_global_state():")
print(pynq.pl_server.global_state.load_global_state())
output:
actual SHA1 hashes:
fc9b47a22eb19f6f90d747c2538e2a0c299dddba qick_216.bit
ee0f46ae4dddd3f5540bbf207fce4c78334ac515 /data/fw/2025-05-28_216_tprocv2r23_8fullspeed/qick_216.bit
fc9b47a22eb19f6f90d747c2538e2a0c299dddba /data/fw/2025-09-13_216_tprocv2r27_8fullspeed/qick_216.bit
contents of PL state file:
{"bitfile_name": "/home/xilinx/jupyter_notebooks/qick/pyro4/qick_216.bit", "active_name": "ZCU216", "timestamp": "2025/9/15 16:8:52 +160524", "bitfile_hash": "ee0f46ae4dddd3f5540bbf207fce4c78334ac515", "shutdown_ips": {}, "psddr": {"raw_type": 1, "used": 1, "base_address": 0, "size": 268435456, "type": "DDR4", "streaming": false, "idx": 0, "tag": "PSDDR"}}
load_global_state():
bitfile_name='/home/xilinx/jupyter_notebooks/qick/pyro4/qick_216.bit' active_name='ZCU216' timestamp='2025/9/15 16:8:52 +160524' bitfile_hash='fc9b47a22eb19f6f90d747c2538e2a0c299dddba' shutdown_ips={} psddr={'raw_type': 1, 'used': 1, 'base_address': 0, 'size': 268435456, 'type': 'DDR4', 'streaming': False, 'idx': 0, 'tag': 'PSDDR'}
a fix?
It seems like the hash needs to be computed when the state is saved to JSON, not when the state is loaded from JSON.
I don't want to submit this as a PR yet because I have no idea what side effects this has - I don't know if there was a good reason for computing the hash at load.
diff --git a/pynq/pl_server/embedded_device.py b/pynq/pl_server/embedded_device.py
index 0caa4572..b53df208 100644
--- a/pynq/pl_server/embedded_device.py
+++ b/pynq/pl_server/embedded_device.py
@@ -646,7 +646,8 @@ class EmbeddedDevice(XrtDevice):
gs=GlobalState(bitfile_name=str(bitstream.bitfile_name),
timestamp=ts,
active_name=self.name,
- psddr=parser.mem_dict.get("PSDDR", {}))
+ psddr=parser.mem_dict.get("PSDDR", {}),
+ bitfile_hash=bitstream_hash(bitstream.bitfile_name))
ip=parser.ip_dict
for sd_name, details in ip.items():
if details["type"] in ["xilinx.com:ip:pr_axi_shutdown_manager:1.0",
diff --git a/pynq/pl_server/global_state.py b/pynq/pl_server/global_state.py
index 0edcff9c..f327f4ea 100644
--- a/pynq/pl_server/global_state.py
+++ b/pynq/pl_server/global_state.py
@@ -46,7 +46,6 @@ class GlobalState(BaseModel):
def __init__(self, **kwargs):
super().__init__(**kwargs)
- self.bitfile_hash = bitstream_hash(self.bitfile_name)
def initial_global_state_file_boot_check()->None:
""" Performs a check to see if this is a coldstart, if it is then clear the
We will still need to live with this bug in the installed base of 3.0.1 users - for that, it seems fine to always run pynq.pl_server.global_state.clear_global_state() before loading the bitstream.
Hi @meeg
I think your method might work.
If you'd like to make the PR I'll be happy to test it out.
Thanks,
Josh
Thanks @meeg for your PR. I've now pulled it into the v3.1.1 branch, which will be merged into Master soon.
I used your fix in PR #1510 to support RemoteDevice as it uses the same logic.