pyOCD icon indicating copy to clipboard operation
pyOCD copied to clipboard

J-Link program speed

Open loveshipting opened this issue 2 years ago • 3 comments

Hi, My test condition is jlink-v9, and windows 10. I use pyocd to program, the speed is about 10KB/S.

$ pyocd flash -t CY8C624AFNI-S2D43 ./cypress_flash.hex  --pack E:/pyocd_pack/Cypress.PSoC6_DFP.1.2.0.pack -f 5M
0001401:INFO:load_cmd:Loading E:\WorkSpace\pyOCD\cypress_flash.hex
[---|---|---|---|---|---|---|---|---|----]
[========================================]
0219615:INFO:loader:Erased 2097152 bytes (8 sectors), programmed 2097152 bytes (4096 pages), skipped 0 bytes (0 pages) at 9.44 kB/s

then, I run the "speed_test.py", the speed is 12KB/S

$ python ./test/speed_test.py
INFO:pyocd.board.board:Target type is cy8c6xxa
INFO:pyocd.board.board:Target type is cy8c6xxa
INFO:pyocd.coresight.dap:DP IDR = 0x6ba02477 (v2 rev6)
INFO:pyocd.coresight.ap:AHB-AP#0 IDR = 0x84770001 (AHB-AP var0 rev8)
INFO:pyocd.coresight.ap:AHB-AP#1 IDR = 0x84770001 (AHB-AP var0 rev8)
INFO:pyocd.coresight.ap:AHB-AP#2 IDR = 0x24770011 (AHB-AP var1 rev2)
INFO:pyocd.coresight.rom_table:AHB-AP#0 Class 0x1 ROM table #0 @ 0xf1000000 (designer=034 part=102)
INFO:pyocd.coresight.rom_table:AHB-AP#1 Class 0x1 ROM table #0 @ 0xf0000000 (designer=034 part=102)
INFO:pyocd.coresight.rom_table:[0]<e00ff000:ROM class=1 designer=43b part=4c0>
INFO:pyocd.coresight.rom_table:  AHB-AP#1 Class 0x1 ROM table #1 @ 0xe00ff000 (designer=43b part=4c0)
INFO:pyocd.coresight.rom_table:  [0]<e000e000:SCS v6-M class=14 designer=43b part=008>
INFO:pyocd.coresight.rom_table:  [1]<e0001000:DWT v6-M class=14 designer=43b part=00a>
INFO:pyocd.coresight.rom_table:  [2]<e0002000:BPU v6-M class=14 designer=43b part=00b>
INFO:pyocd.coresight.rom_table:[1]<f0002000:CTI M0+ class=9 designer=43b part=9a6 devtype=14 archid=1a14 devid=1040800:0:0>
INFO:pyocd.coresight.rom_table:[2]<f0003000:MTB M0+ class=9 designer=43b part=932 devtype=31 archid=0a31 devid=0:0:0>
INFO:pyocd.coresight.rom_table:AHB-AP#2 Class 0x1 ROM table #0 @ 0xe00ff000 (designer=034 part=102)
INFO:pyocd.coresight.rom_table:[0]<e0080000:CTI class=9 designer=43b part=906 devtype=14 archid=0000 devid=40800:0:0>
INFO:pyocd.coresight.rom_table:[3]<e008e000:TPIU M3 class=9 designer=43b part=923 devtype=11 archid=0000 devid=ca1:0:0>
INFO:pyocd.coresight.rom_table:[4]<e007f000:ROM class=1 designer=034 part=102>
INFO:pyocd.coresight.rom_table:  AHB-AP#2 Class 0x1 ROM table #1 @ 0xe007f000 (designer=034 part=102)
INFO:pyocd.coresight.rom_table:  [0]<e000e000:SCS v7-M class=14 designer=43b part=00c>
INFO:pyocd.coresight.rom_table:  [1]<e0001000:DWT v7-M class=14 designer=43b part=002>
INFO:pyocd.coresight.rom_table:  [2]<e0002000:FPB v7-M class=14 designer=43b part=003>
INFO:pyocd.coresight.rom_table:  [3]<e0000000:ITM v7-M class=14 designer=43b part=001>
INFO:pyocd.coresight.rom_table:  [4]<e0042000:CTI class=9 designer=43b part=906 devtype=14 archid=0000 devid=40800:0:0>
INFO:pyocd.coresight.rom_table:  [5]<e0041000:ETM M4 class=9 designer=43b part=925 devtype=13 archid=0000 devid=0:0:0>
INFO:pyocd.coresight.cortex_m:CPU core #0 is Cortex-M0+ r0p1
INFO:pyocd.coresight.cortex_m:CPU core #1 is Cortex-M4 r0p1
INFO:pyocd.coresight.cortex_m:FPU present: FPv4-SP-D16-M
INFO:pyocd.coresight.dwt:2 hardware watchpoints
INFO:pyocd.coresight.fpb:4 hardware breakpoints, 0 literal comparators
INFO:pyocd.coresight.dwt:4 hardware watchpoints
INFO:pyocd.coresight.fpb:6 hardware breakpoints, 4 literal comparators


------ TEST RAM READ / WRITE SPEED [uncached 8-bit] ------
Writing 32768 byte took 2.534 seconds: 12932.860 B/s
Reading 32768 byte took 2.641 seconds: 12406.393 B/s
TEST PASSED


------ TEST ROM READ SPEED [uncached 8-bit] ------
Reading 1048576 byte took 84.726 seconds: 12376.128 B/s
TEST PASSED


------ TEST RAM READ / WRITE SPEED [uncached 32-bit] ------
Writing 32768 byte took 2.122 seconds: 15444.448 B/s
Reading 32768 byte took 2.151 seconds: 15231.991 B/s
TEST PASSED


------ TEST ROM READ SPEED [uncached 32-bit] ------
Reading 1048576 byte took 69.954 seconds: 14989.456 B/s
TEST PASSED


------ TEST RAM READ / WRITE SPEED [cached 8-bit, pass 1] ------
Writing 32768 byte took 2.106 seconds: 15562.334 B/s
Unexpected ram read elapsed time of 0!
Reading 32768 byte took 0.000 seconds: 0.000 B/s
TEST PASSED


------ TEST ROM READ SPEED [cached 8-bit, pass 1] ------
Reading 1048576 byte took 69.281 seconds: 15135.033 B/s
TEST PASSED


------ TEST RAM READ / WRITE SPEED [cached 8-bit, pass 2] ------
Writing 32768 byte took 2.110 seconds: 15529.141 B/s
Reading 32768 byte took 0.001 seconds: 32856551.153 B/s
TEST PASSED


------ TEST ROM READ SPEED [cached 8-bit, pass 2] ------
Reading 1048576 byte took 0.006 seconds: 175234939.481 B/s
TEST PASSED
INFO:pyocd.coresight.dap:DP IDR = 0x6ba02477 (v2 rev6)


------ Speed Test Performance ------
Target             RAM Read Speed   RAM Write Speed    ROM Read Speed

cy8c6xxa              12.406 KB/s       12.933 KB/s       12.376 KB/s

But, if I use openocd, the speed is 90KB/S, is that normal?

***************************************
** Silicon: 0xE455, Family: 0x102, Rev.: 0x12 (A1)
** Detected Device: CY8C624AFNI-S2D43
** Detected Main Flash size, kb: 2048
** Flash Boot version: 3.1.0.378
** SFlash version: 292144
** Chip Protection: NORMAL
***************************************
Info : psoc6.cpu.cm4: hardware has 6 breakpoints, 4 watchpoints
Info : psoc6.cpu.cm4: external reset detected
Info : Listening on port 3333 for gdb connections
Info : Listening on port 3334 for gdb connections
adapter speed: 5000 kHz
Info : SWD DPIDR 0x6ba02477
target halted due to debug-request, current mode: Thread
xPSR: 0x41000000 pc: 0x000000e0 msp: 0x080ff800
** psoc6.cpu.cm4: Ran after reset and before halt...
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x0000012a msp: 0x080ff800
** Programming Started **
auto erase enabled
Info : Flash write discontinued at 0x100227c8, next section at 0x10025000
Info : Padding image section 0 at 0x100227c8 with 56 bytes (bank write end alignment)
[100%] [################################] [ Erasing     ]
[100%] [################################] [ Programming ]
Info : Padding image section 1 at 0x101eb1a4 with 92 bytes (bank write end alignment)
[100%] [################################] [ Erasing     ]
[100%] [################################] [ Programming ]
wrote 1870336 bytes from file ./build/CY8CPROTO-062-4343W_B20/Debug/SmartWatch.hex in 19.714502s (92.648 KiB/s)
** Programming Finished **

** Verify Started **
verified 1870188 bytes in 3.504242s (521.184 KiB/s)
** Verified OK **
** Resetting Target **
Info : SWD DPIDR 0x6ba02477

loveshipting avatar Dec 07 '21 06:12 loveshipting

Ooh, that's quite a difference! pyocd is currently (considerably) slower programming flash through a J-Link than letting the J-Link do the programming (which is how openocd works). That's because pyocd uses the low-level J-Link DAP commands instead of the high level J-Link flash programming commands. This is mostly for compatibility and consistent behaviour, but using the low-level commands is a lot slower due to the way they work.

There's a lot of room for improvement, but I haven't had time to do it myself. This is an area ripe for someone from the community to work on. 😉 Sorry the performance is so bad right now.

flit avatar Dec 07 '21 22:12 flit

stlink v2-1 is significantly faster than jlink in pyocd (rougly ~5 times, I can do more precise benchmarks). Is that expected as stlink has limited low-level command capabilities? (at least in openocd, stlink is HLA - high level adapter)

diggit avatar Apr 16 '22 08:04 diggit

Yep, that's expected at the moment. STLink actually has pretty good low-level commands. JLink's are pretty close to the same. The major difference is that STLink makes it easier to use high level memory transfer commands, and 1) they work with any AP, 2) you can configure the AHB/AXI transfer attributes. Neither of which are possible with JLink afaik. So there's no reason to use HLA with STLink (this wasn't true for early STLink2 firmware versions).

It's definitely possible to improve JLink performance on pyocd to approximate equal STLink or CMSIS-DAPv2, and not really all that hard. But someone other than me will have to implement it if it's to be done in the forseeable future. The main changes required are to use high level memory transfers if the JLink device type is configured correctly (so basically duplicating the pyocd target type name, but sometimes slightly different spelling of the part number 😞), plus a fallback to the non-accelerated memory transfers (e.g. the current method) for secondary cores and other APs.

flit avatar Apr 21 '22 21:04 flit