PYNQ pynq.Overlay() always uses cached .hwh information

After migrating to PYNQ 3.0.0, I faced problems when updating existing overlays. Basically, the changes in the .hwh file do not propagate to pynq.Overlay(), which always uses the cached .hwh information.

To reproduce this error:

Create a simple FPGA design & generate overlay
Upload overlay to FPGA
Load overlay within Jupyter Notebook via ol = pyqn.Overlay("bitstream.bit")
Inspect ol.ip_dict
Update the FPGA design (change instance names, addressing...) & generate overlay
Idem to 2
Idem to 3
idem to 4. The displayed information has not changed!

Nov 25 '22 08:11 dspsandbox

Hi @dspsandbox,

Would you be able to share the .hwh and .bit files you were using for this test? It would be very helpful for recreating the issue.

In the mean time you can call the following between step 5 and 6 to clear and cached data:

from pynq import PL
PL.reset()

Thanks, Shane

Nov 25 '22 09:11 STFleming

Hi Shane, thanks you so much for the quick reply, your solutions works :)

Here are the two versions of the same overlay I was using. They include a screenshot of the design, where you can see that the only difference is the instance name of an axi gpio.

BR, Pau

Nov 25 '22 12:11 dspsandbox

Hi @dspsandbox,

I am interested in understanding under what circumstances would the hwh change but not the bitstream? Or are you saying that the bitstream and hwh did change?

Nov 25 '22 13:11 mariodruiz

Hi @mariodruiz, I am always updating/changing the whole overlay (i.e .hwh and .bit together). The problem is that, after migrating to PYNQ 3.0.0, any update in the design is not visible within the Jupyter notebook. In other words, pynq.Overlay( ) always returns the IP handles within the first version of my overlay. My assumption was that this is a .hwh cache problem...

Nov 25 '22 13:11 dspsandbox

Hi. I came here by googling the same behaviour that @dspsandbox reported. This weekend I encountered some weird situations that basically consisted in 1) generating hwh/bit from my Vivado design, 2) loading the corresponding overlay from pynq, 3) then changing some bits in my Vivado design (e.g. changing one axi_dma parameter, or replacing the axi_dma by a axi_vdma) and generating new hwh/bit, 4) loading the new overlay and 5) finding out that the "new overlay" behaved as the old one (e.g. complains of axi_dma transfer length too long, though I had increased the value or not detecting any vdma but a dma when I had just replaced the dma by the vdma). I just realised that restarting and reloading the overlay seems to solve the issue. I will try out doing "PL.reset()" (as rebooting each time I need to load a new overlay takes its time).

Regards

Feb 12 '24 14:02 juanma-rm

I get the same behavior using PYNQ release 3.0.1 on ZCU208. For several weeks I mistook the bug for an issue with the HWH parser not recognizing some custom IP, but then I made a change to an overlay to delete some existing IP and it still showed up in Overlay.ip_dict despite not being present at all in the HWH file. A power cycle fixed it. I'll try the PL.reset() workaround ASAP, but it'd be nice to squash this bug at its root.

May 30 '24 21:05 keithpenney

I had a similar problem after I renamed the .bit and .hwh files. After a quick glance, it seems to me like it caches the path to the bitstream file, but when the file/filename changes it doesn't invalidate the cache. Right now I don't have enough free time to check it out, but I could have a look during the weekend and maybe do a PR?

Jul 24 '24 16:07 NBoumakis

This has been causing occasional problems for users and developers of the QICK project.

observed behavior

We did not dive into this in any rigorous way, but our observations were consistent with the others in this thread (and we summarized them in https://github.com/openquantumhardware/qick/wiki/clearing-cached-PYNQ-metadata):

in some cases, if you load one bitstream (call this A) and then another (call this B), the cached metadata from the bitstream A will be kept
this seems to happen most often when the A and B bitstreams were loaded from the same file path (in other words, A's .bit+.hwh files were overwritten with B's)
- we are not sure if this is a necessary or sufficient condition - i.e. we have the impression that this bug can occur in other situations, and we are not totally sure that the bug will reproduce 100% of the time under this situation
loading another bitstream (C) and then B again seemed to fix the problem; clearing the PL server state with pynq.pl_server.global_state.clear_global_state() seems to reliably fix the problem

reproducing the bug

I did some digging today.

Here's what seems to be a reliable repro.

ZCU216 with a PYNQ 3.0.1 SD card image
running in a Jupyter notebook
.bit and .hwh are from https://s3df.slac.stanford.edu/people/meeg/qick/tprocv2/2025-05-28_216_tprocv2r23_8fullspeed/ and https://s3df.slac.stanford.edu/people/meeg/qick/tprocv2/2025-09-13_216_tprocv2r27_8fullspeed/

code (load the two bitstreams and print the TIMESTAMP field in the .hwh header):

import pynq
!cp /data/fw/2025-05-28_216_tprocv2r23_8fullspeed/qick_216.* .
soc = pynq.Overlay("qick_216.bit")
print("metadata after copying+loading 2025-05-28 bitfile:", soc.device.parser.systemgraph._root.get("TIMESTAMP"))

!cp /data/fw/2025-09-13_216_tprocv2r27_8fullspeed/qick_216.* .
soc = pynq.Overlay("qick_216.bit")
print("metadata after copying+loading 2025-09-13 bitfile:", soc.device.parser.systemgraph._root.get("TIMESTAMP"))

print("\nactual timestamps:")
!head -n2 qick_216.hwh \
/data/fw/2025-05-28_216_tprocv2r23_8fullspeed/qick_216.hwh \
/data/fw/2025-09-13_216_tprocv2r27_8fullspeed/qick_216.hwh

output (where in the second line I should see a timestamp of September 13, but I get the old value of May 28):

metadata after copying+loading 2025-05-28 bitfile: Wed May 28 13:38:12 2025
metadata after copying+loading 2025-09-13 bitfile: Wed May 28 13:38:12 2025

actual timestamps:
==> qick_216.hwh <==
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<EDKSYSTEM EDWVERSION="1.2" TIMESTAMP="Sat Sep 13 11:40:05 2025" VIVADOVERSION="2023.1">

==> /data/fw/2025-05-28_216_tprocv2r23_8fullspeed/qick_216.hwh <==
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<EDKSYSTEM EDWVERSION="1.2" TIMESTAMP="Wed May 28 13:38:12 2025" VIVADOVERSION="2023.1">

==> /data/fw/2025-09-13_216_tprocv2r27_8fullspeed/qick_216.hwh <==
<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<EDKSYSTEM EDWVERSION="1.2" TIMESTAMP="Sat Sep 13 11:40:05 2025" VIVADOVERSION="2023.1">

possible cause

BitstreamHandler.get_parser() decides whether to use cache or read the HWH file by checking the hash in the cache against hash of the new .bit file:

https://github.com/Xilinx/PYNQ/blob/19ed17d4fd4ea71f95a97d30f8ccd0d62b8dbaaa/pynq/pl_server/embedded_device.py#L194

But the "hash in the cache" is GlobalState.bitfile_hash and this is a freshly computed hash of the .bit file at the cached location:

https://github.com/Xilinx/PYNQ/blob/19ed17d4fd4ea71f95a97d30f8ccd0d62b8dbaaa/pynq/pl_server/global_state.py#L49

The result is that if you copy a new .bit into the old location, the hash comparison will fail to invalidate the cached metadata.

Demonstrating this a little: after running the repro code above, the bitfile_hash in pynq.pl_server.global_state.load_global_state() differs from the hash in the JSON state file.

code:

print("actual SHA1 hashes:")
!sha1sum qick_216.bit \
/data/fw/2025-05-28_216_tprocv2r23_8fullspeed/qick_216.bit \
/data/fw/2025-09-13_216_tprocv2r27_8fullspeed/qick_216.bit

print("\ncontents of PL state file:")
!cat /usr/local/share/pynq-venv/lib/python3.10/site-packages/pynq/pl_server/global_pl_state.json

print("\n\nload_global_state():")
print(pynq.pl_server.global_state.load_global_state())

output:

actual SHA1 hashes:
fc9b47a22eb19f6f90d747c2538e2a0c299dddba  qick_216.bit
ee0f46ae4dddd3f5540bbf207fce4c78334ac515  /data/fw/2025-05-28_216_tprocv2r23_8fullspeed/qick_216.bit
fc9b47a22eb19f6f90d747c2538e2a0c299dddba  /data/fw/2025-09-13_216_tprocv2r27_8fullspeed/qick_216.bit

contents of PL state file:
{"bitfile_name": "/home/xilinx/jupyter_notebooks/qick/pyro4/qick_216.bit", "active_name": "ZCU216", "timestamp": "2025/9/15 16:8:52 +160524", "bitfile_hash": "ee0f46ae4dddd3f5540bbf207fce4c78334ac515", "shutdown_ips": {}, "psddr": {"raw_type": 1, "used": 1, "base_address": 0, "size": 268435456, "type": "DDR4", "streaming": false, "idx": 0, "tag": "PSDDR"}}

load_global_state():
bitfile_name='/home/xilinx/jupyter_notebooks/qick/pyro4/qick_216.bit' active_name='ZCU216' timestamp='2025/9/15 16:8:52 +160524' bitfile_hash='fc9b47a22eb19f6f90d747c2538e2a0c299dddba' shutdown_ips={} psddr={'raw_type': 1, 'used': 1, 'base_address': 0, 'size': 268435456, 'type': 'DDR4', 'streaming': False, 'idx': 0, 'tag': 'PSDDR'}

a fix?

It seems like the hash needs to be computed when the state is saved to JSON, not when the state is loaded from JSON.

I don't want to submit this as a PR yet because I have no idea what side effects this has - I don't know if there was a good reason for computing the hash at load.

diff --git a/pynq/pl_server/embedded_device.py b/pynq/pl_server/embedded_device.py
index 0caa4572..b53df208 100644
--- a/pynq/pl_server/embedded_device.py
+++ b/pynq/pl_server/embedded_device.py
@@ -646,7 +646,8 @@ class EmbeddedDevice(XrtDevice):
                 gs=GlobalState(bitfile_name=str(bitstream.bitfile_name),
                                  timestamp=ts,
                                  active_name=self.name,
-                                 psddr=parser.mem_dict.get("PSDDR", {}))
+                                 psddr=parser.mem_dict.get("PSDDR", {}),
+                                 bitfile_hash=bitstream_hash(bitstream.bitfile_name))
                 ip=parser.ip_dict
                 for sd_name, details in ip.items():
                     if details["type"] in ["xilinx.com:ip:pr_axi_shutdown_manager:1.0",
diff --git a/pynq/pl_server/global_state.py b/pynq/pl_server/global_state.py
index 0edcff9c..f327f4ea 100644
--- a/pynq/pl_server/global_state.py
+++ b/pynq/pl_server/global_state.py
@@ -46,7 +46,6 @@ class GlobalState(BaseModel):
 
     def __init__(self, **kwargs):
         super().__init__(**kwargs)
-        self.bitfile_hash = bitstream_hash(self.bitfile_name)
 
 def initial_global_state_file_boot_check()->None:
     """ Performs a check to see if this is a coldstart, if it is then clear the

We will still need to live with this bug in the installed base of 3.0.1 users - for that, it seems fine to always run pynq.pl_server.global_state.clear_global_state() before loading the bitstream.

Sep 15 '25 22:09 meeg

Hi @meeg

I think your method might work.

If you'd like to make the PR I'll be happy to test it out.

Thanks,

Josh

Sep 17 '25 10:09 jogomojo

Thanks @meeg for your PR. I've now pulled it into the v3.1.1 branch, which will be merged into Master soon.

I used your fix in PR #1510 to support RemoteDevice as it uses the same logic.

Sep 22 '25 19:09 jogomojo