pyOCD icon indicating copy to clipboard operation
pyOCD copied to clipboard

pyOCD sensitive to timing?

Open jsiverskog opened this issue 1 year ago • 5 comments

hi, we're having issues with using pyOCD in some cases. the setup is a raspberry pi - that is, a limited environment when it comes to resources.

when running from command line everything works fine, but if having chrome active at the same time (in our case it runs a factory production system) pyOCD operations sometimes fail ("pyocd.core.exceptions.TransferError: No ACK received"). my only explanation is that chrome somehow locks up the system for short periods of time, and pyOCD is timing sensitive.

i would expect that all timing sensitive operations would be done between the probe and the DUT - without python being involved.

can someone confirm whether this is true? and if so, can something be done about it?

jsiverskog avatar Jun 29 '23 08:06 jsiverskog

Hi @jsiverskog,

Can you please detail your setup? My understanding is that you are using PSoC devices, and have to gain control of the CPU before the SWD pins are muxed to GPIO—is that right?

The target shouldn't be sensitive to timing of low level operations such as AP/DP transfers via SWD (or JTAG) controlled by the host. The standard Arm switch-to-SWD and DP connect procedures are also not timing sensitive, nor are things like the issuing of memory transfers via an AP.

Timing can come into play in at least two cases:

  1. A run-to-completion operation, where state is retained after the operation is completed but the host needs to wait for the operation to complete. An example would be CPU register access, where a transfer-completed flag must be checked. This kind of operation should not be affected by a host whose performance is degraded (eg by Chrome).
  2. Windowed operations, where you are trying to synchronising with and control an asynchronous process running on the target. Examples: reading a variable in memory while it has a certain value, or gaining halting the CPU before a certain stage of firmware boot is reached. This case could definitely be affected by the host running slower.

If Chrome is slowing down pyocd while an operation of the second kind is taking place, the issue you're having could appear. But of course it depends on exactly what the host is trying to do when it gets slowed down. Since you are seeing a No ACK response, it means that there is no response at all via SWD. This implies one of:

  • The initial connection to the target failed
  • The target was asynchronously reset at some point and hasn't been reconnected
  • SWD was muxed out
  • The target is asleep

(There could be other conditions too.)

Finally, there are some occasional USB related issues when running on a Raspberry Pi. I've seen it myself very rarely. But it seems it might be related to issuing USB requests to quickly, and the failures are different from the No ACK error you describe. In any case, I've been setting up an RPi4 test environment so I can try to reproduce and solve these issues.

flit avatar Jun 30 '23 19:06 flit

hi @flit , thanks for your thorough answer. this is with a psoc 6 device yes. the issue appears even with unflashed boards, so the swd pin mux thing should not be the issue here (it's done in software).

i'm aware that the usb stack on the raspberry pi is not perfect - we've had a lot of other issues with it. i opened this issue to get a better understanding of the timing requirements, not to blame anything or anyone. currently i have a workaround for this by avoiding doing anything compute intensive in chrome while swd operations are ongoing. on the raspberry pi, you don't have to do a lot in chrome for it to be "compute intensive" - in my case it was updating a progress bar.

when i got the no ack error, i was calling ConnectHelper.session_with_chosen_probe or FileProgrammer.program. i'm not sure what they do in terms of swd operations.

jsiverskog avatar Jul 06 '23 13:07 jsiverskog

Hi @jsiverskog,

No worries—I wasn't thinking you were trying to blame anything! 😄 I'd just like to help if I can, especially since running pyocd on an RPi is a great solution for CI.

Are you using the PSoC6 CMSIS-Pack, or the builtin target support? There are a few differences in how Cypress implementing things, not to mention the differences caused by the use of CMSIS-Pack debug sequence.

An interesting, but not definitive, test would be to try similar operations (connect or flash) with a different vendor's MCU while chrome is busy doing things. Basically, I'm concerned that there is a PSoC-specific race condition during connect that we limited (or no) control over when performing operations from the host.

When you get the error during ConnectHelper.session_with_chosen_probe(), does that happen if you only connect and disconnect without attempting to program? What parameters are you passing?

One possibility is that there is a time window after reset during which SWD is inaccessible. (The NXP Kinetis devices have something like this.)

flit avatar Jul 07 '23 19:07 flit

@flit , great!

we use the built-in target support. i have not had time yet to see if the same issue appears with another mcu - it could absolutely be something psoc 6 specific here, in terms of timing.

i call session_with_chosen_probe with:

ConnectHelper.session_with_chosen_probe(
            blocking=False,
            unique_id=[unique id],
            target_override="CY8C6xx7_nosmif",
            frequency=1000000,
            keep_unwritten=True,
            chip_erase="sector",
            connect_mode="pre-reset",
            auto_open=False,
            options={"reset.post_delay": 0, "reset.dap_recover.timeout": 0})

sometimes it fails already when calling the above function, which is done prior to flashing, so it seems to happen also without trying to flash.

i think i should confirm whether it occurs with other mcu:s or not.

jsiverskog avatar Jul 12 '23 08:07 jsiverskog

Hi @jsiverskog, thanks for the additional info.

A debug log might help pin down where exactly it fails.

From what you've said, it appears that it can fail at different times? But it would be interesting to know if the same types of operations are being performed when it fails (eg post-reset or something like that).

To start with, could you please attach a few debug logs with increasing levels of detail? The code below uses pyocd's color logger, but you can also use logging.basicConfig().

First, add this.

import logging
logging.basicConfig(level=logging.DEBUG)

Then create a pyocd.yaml config file in the working directory of your script that contains the following log config to write the log to a file:

logging:
  loggers:

  root:
    handlers: [logfile]
  formatters:
    precise:
      format: "[%(relativeCreated)07d:%(levelname)s] %(message)s [%(module)s]"
  handlers:
    logfile:
      class: logging.FileHandler
      formatter: precise
      filename: pyocd_log.txt
      mode: w
      delay: false

(If you try to set the 'pyocd' logger to DEBUG in the yaml config, it will also turn on all trace loggers, so don't do that.)

Generate the first log file containing just basic debug log output with the above config.

Then enable CMSIS-DAP logging to generate a second log file by adding this to the yaml under logging/loggers:

    pyocd.probe.cmsis_dap_probe.trace: {level: DEBUG}

If possible, try to generate these only for connect/disconnect, not flash programming.

Finally, it would be helpful to try generating the CMSIS-DAP trace log for a successful case and multiple failing cases so they can be compared. If you want to compare runs yourself, either strip out the %(relativeCreated)07d: part of the log format or use a regex search/replace filter, otherwise small timing differences will cause all the lines to compare as different.

You can email the log files to me if you want.


You don't need to generate it now, but I'll go ahead and mention the config for enabling AP+DP trace logging:

    pyocd.coresight.ap.trace: {level: DEBUG}
    pyocd.coresight.dap.trace: {level: DEBUG}

If we see something in the log that looks suspicious, we can use this to narrow down what exactly is happening at the CoreSight level.

Beware that the AP+DP trace log will include the hex data for everything written to target memory, including firmware that is programmed.

flit avatar Jul 12 '23 21:07 flit