pyOCD icon indicating copy to clipboard operation
pyOCD copied to clipboard

Exploit JLink capabilities in AP

Open martinpriestley opened this issue 3 years ago • 5 comments

Hi,

Alongside @ksigurdsson I've been using pyOCD to test an M3 in a custom ASIC, via a SEGGER JLink debugger. During this we've noticed that loading and dumping regions of memory is very slow (loadmem of ~10kB took over 5s, which is significant when multiplied by even a handful of test cases). I traced this to the implementation of _write_memory_block32 in coresight/ap.py, which ends up calling across to pylink (and thus the actual jlink C library) once per 32-bit write.

As an experiment I hacked this function to directly call memory_write in pylink/jlink.py, thus passing the whole block to the jlink API in one go. loadmem now takes ~50ms for 10kB.

It seems to me that given the structure of the pyOCD code, the right way to expliot the abilities of the jlink library is to create a specialised implementation of the AP classes in ap.py. Do you agree? If so, I'm happy to contribute the implementation (with a little guidance on where to put it and how to choose to use the specialised version).

Thanks

martinpriestley avatar Dec 04 '20 12:12 martinpriestley

Hi @martinpriestley, yeah, unfortunately the structure of the J-Link low level API doesn't really make it possible to do fast memory transfers. There is already a method in pyocd to support accelerated memory transfers, but I didn't implement this for J-Link because it's not clear how to map APs to the J-Link memory transfer APIs.. If it can be made to work, then it should always be used (no need for a special version).

The DebugProbe.get_memory_interface_for_ap() method can return a MemoryInterface instance associated with the given AP. MEM_AP will already call this method on the probe, and, if it returns an object, use it for memory transfers instead of AP register accesses.

Here's the STLink implementation as an example: https://github.com/pyocd/pyOCD/blob/4b45409675a5583c860a583d78f25b0c5598b7e3/pyocd/probe/stlink_probe.py#L207

The J-Link API (at least what's exposed from pylink-square) lets you specify a memory zone to access for memory transfers. There is also a memory_zones() method that returns a list of the available zones. But when I call this on the dual-core nRF5340, the result is an empty list. So it's not clear if you can even specify which AP/core to access memory through—which makes it unusable for anything more complex than a single AP/single core system.

flit avatar Dec 04 '20 17:12 flit

Hi,

Thanks for pointing me at get_memory_interface_for_ap(). Makes for a more elegant local patch even if we can't find a universal solution.

Re selecting between multiple APs - how is this done at the moment? In the JLINK documentation I see the CORESIGHT_SetIndexAHBAPToUse and CORESIGHT_SetIndexAPBAPToUse commands, which could be set via pylink's exec_command(). I'm not sure how we discover the available APs.

Re zones, this appears to be about non-uniform address maps, so isn't necessarily a feature of all multi-core systems. JLINK docs say:

Silicon Labs EFM8 [...] is the only CPU so far, J-Link supports, that provides such zones

which might explain why you got an empty list.

martinpriestley avatar Dec 05 '20 11:12 martinpriestley

I hadn't even thought of checking the J-Link docs for command strings. The CORESIGHT_SetIndexAHBAPToUse and APB variant commands certainly look like they might work. We'll have to see how they work in practice.

There are some open questions (aside from the obvious, "will they work at all?"), such as what the actual distinction between the commands is in the internal J-Link implementation. When does it actually use the CORESIGHT_SetIndexAHBAPToUse value versus the CORESIGHT_SetIndexAPBAPToUse value? Or, how to handle AXI-APs? Many Cortex-A systems have an APB-AP for the CPU cluster debug, another APB-AP for system-level trace control, and an AXI-AP for direct access to the system bus fabric. While pyOCD doesn't yet support Cortex-A debug, I was to keep the CoreSight-level support for it in place.

Pyocd discovers the APs when it connects. You can see this printed out in the log. The get_memory_interface_for_ap() method is passed an ap_address parameter, so it should have everything it needs to build the J-Link command string.

There are two versions of ADI (Arm Debug Interface), v5.x and v6, and two corresponding AP versions, v1 and v2. All existing Cortex-M devices use ADIv5.2, but Cortex-M55 based devices will use ADIv6. The main difference between the two versions is how APs are addressed. ADIv5.x uses a single 8-bit index (as you can see in the J-Link docs for these commands), while ADIv6 replaces that with an APB bus with 12- to 52-bit addresses (usually just 32-bit in systems I've seen).

The point of the ADI/AP version difference is that, until J-Link supports ADIv6, the get_memory_interface_for_ap() code will have to only return an object if it is passed an APv1 address. Pyocd will then fall back to low level AP-register-based memory transfers if it sees an ADIv6 system (assuming the J-Link firmware would even let you connect).

One more thing, we'll need to ensure that the J-Link probe is locked during the memory transfer to prevent another thread from changing the selected AP in the middle of a transfer.

flit avatar Dec 06 '20 21:12 flit

When does it actually use the CORESIGHT_SetIndexAHBAPToUse value versus the CORESIGHT_SetIndexAPBAPToUse value?

That seems to be by Cortex family. JLink doc says:

7.14.1.2 CORESIGHT_SetIndexAHBAPToUse This command is used to select a specific AHB-AP to be used when connected to an ARM Cortex-M device.

and

7.14.1.3 CORESIGHT_SetIndexAPBAPToUse This command is used to select a specific APB-AP to be used when connected to an ARM Cortex-A or Cortex-R device.

from which we might infer that the JLink memory transfer functions never use an AXI-AP. I'm not familiar enough with the ARM CPU families, and which of them JLink claim to support, to say whether that sounds sensible.

Falling back to AP-register mode for anything JLink doesn't support seems the right course of action. If the AP selection mechanisms don't work, we could even fall back for any system with multiple APs (single AP support would stop me annoying you...)

martinpriestley avatar Dec 07 '20 11:12 martinpriestley

That seems to be by Cortex family.

Being based on the Cortex family raises a potential issue.

The J-Link DLL requires you to pass in the device name. Right now, pyOCD just sets the device name to "Cortex-M4" by default (but controllable with the 'jlink.device' option). The J-Link log shows that it performs CoreSight discovery (finding APs and debug IP like cores, FPB, DWT, etc). If the CPU type is not Cortex-M4 (whatever is expected based on the device name) then it will report a warning, but continue on.

The question is whether the J-Link DLL uses the expected (based on device type) core type, or the actual core type that it sees during discovery. There is also the question of how it behaves when presented with a so-called A+M system where you have both Cortex-A/R and Cortex-M cores.

(Aside: I've been wondering if it's possible to automatically select the J-Link device name. You can iterate over available devices, but there are around 7700 so I'm not sure how fast it would be. And that's assuming the J-Link device names reliably match the part numbers available in CMSIS-Packs; builtin pyocd targets would be easier to handle.)

If the AP selection mechanisms don't work, we could even fall back for any system with multiple APs

Very true!

flit avatar Dec 07 '20 18:12 flit