usbipd-win icon indicating copy to clipboard operation
usbipd-win copied to clipboard

Troubles flashing with SEGGER J-Link to nrf9160dk

Open glarsennordic opened this issue 2 years ago • 28 comments

Intro

Hi, I'm not quite sure if this is an issue with usb-ipd, with SEGGER, or somewhere in between, but I figured I should start here and see what I can glean.

Myself and a couple of others have been having issues, recently, flashing nrf9160dk devices using (in essence) SEGGER J-Link inside of WSL via usbipd.

This actually used to work for me, some time in January, but at some point, without my (apparent) intervention, the following error began cropping up when attempting to flash:

The JLinkARM DLL timed out while communicating to the J-Link probe

I thought, at first, this was just an issue with a faulty usbpcap install from Wireshark, but now a colleague of mine is experiencing this issue on a brand new laptop with a fresh install of WSL, usbipd, SEGGER J-Link, and the associated tools.

In case anyone wants to and has the hardware to recreate our exact setup, we are building and flashing samples and applications inside the nRF Connect SDK (MQTT Multi Service Sample in my case), and we are using west to accomplish both build and flash.

west actually calls nrfjprog under the hood, which in turn makes use of the JLink DLL from SEGGER. The JLink DLL, nrfjprog, etc, are installed manually within WSL. I would like to stress that all of this did, at some point work for me, so installation issues seem less likely. Furthermore, several features of JLink actually appear to work (more on that in a bit). It is just flashing that is failing.

Even more confoundingly, this issue sometimes just magically resolves itself. For about 3 hours during the authoring of this issue, I was actually able to flash one of our boards. More on that later too.

What works and what doesn't

Here is what works:

  • Power-cycling the nrf9160dk causes it to unbind from usbipd (as one would expect)
  • Merely pressing the reset button on the device causes no disconnection, and it still appears under nrfjprog --ids (which lists the available SEGGER ids)
  • Running any combination of nrfjprog -r, nrfjprog -p, nrfjprog --recover, nrfjprog -e, which are various ways to wipe/erase, hard-, and soft-reset the device causes no error, nor any disconnection ,and the device is still visible under nrjprog --ids

However, when my colleague tries to run: nrfjprog --program zephyr/merged.hex --chiperase --verify -f NRF91 --snr 960095070 --log, which attempts to flash the application binary (by erasing), this is the output instead:

Parsing image file.
Verifying programming.
ERROR: [  JLink] - Communication timed out: Requested 8196 bytes, received 0 bytes !
ERROR: [SeggerBackend] - JLinkARM.dll ReadMem returned error 1.
ERROR: [SeggerBackend] - JLinkARM.dll ReadMem returned error 1.
ERROR: [SeggerBackend] - JLinkARM.dll ReadMem returned error 1.
ERROR: [SeggerBackend] - JLinkARM.dll ReadMem returned error 1.
ERROR: [SeggerBackend] - JLinkARM.dll ReadMem returned error 1.
ERROR: [SeggerBackend] - JLinkARM.dll reported "-1", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported "-1", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported "-1", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported "-1", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported "-1", "An unknown error.".
ERROR: [  nRF91] - The read access failed due to memory protection.
ERROR: [  nRF91] - Failed to read from flash
ERROR: [  nRF91] - Flash verification failed.
ERROR: [  nRF91] - Failed while verifying file zephyr/merged.hex.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 3874.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4625.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 3874.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4625.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4334.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4382.
ERROR: [SeggerBackend] - JLinkARM.dll reported "-1", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4455.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4382.
ERROR: [SeggerBackend] - JLinkARM.dll reported "-1", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4455.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4382.
ERROR: [SeggerBackend] - JLinkARM.dll reported "-1", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4455.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4382.
ERROR: [SeggerBackend] - JLinkARM.dll reported "-1", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4455.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4382.
ERROR: [SeggerBackend] - JLinkARM.dll reported "-1", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4455.
ERROR: The JLinkARM DLL timed out while communicating to the J-Link probe.
ERROR: If the error condition persists, run the same command again with
ERROR: argument -- log, contact Nordic Semiconductor and provide the generated
ERROR: log.log file to them.

Notice I said "colleague", because, as mentioned before, this magically stopped being a problem for me! (for a while, anyways)

All of this, of course, is after running through various versions of the standard usbipd setup. (I will give a detailed description of one such set-up, and its trace logs, in a later section)

Basically, SEGGER seems to work, except when doing this one very specific operation.

Favoritism:

One fascinating thing we discovered while testing all of this out is that, when swapping boards:

  1. I can only flash my board from WSL
  2. Only I can flash my board from WSL
  3. This was not the case when we started today (I could not flash my board, or any boards at all, from WSL)

Rebooting WSL with wsl --shutdown had no effect on this,

Restarting both WSL and the usbipd server, however, finally killed my special ability to flash my own board. I did that with:

PS C:\Windows\system32> wsl --shutdown
PS C:\Windows\system32> sc stop usbipd
PS C:\Windows\system32> sc start usbipd
PS C:\Windows\system32> usbipd wsl attach -b 4-1

Detailed Reproduction

Speaking frankly, I'm not convinced usbipd is at fault here. However, the folks here seem very knowledgeable, and on the off-chance one of you can recognize the problem that myself and my colleagues are having, I would like to post my detailed process and my detailed logs here.

Before getting to this section, I was using an old version of usbipd. I have updated to the latest to give the highest chance of success. Currently, the following is true of my setup:

Windows: Version 10.0.19044 Build 19044
WSL: 5.10.102.1-microsoft-standard-WSL2
Ubuntu release: 20.04
nrfjprog version: 10.15.2 external
JLinkARM.dll version: 7.60g
Installed JLink package name: JLink_Linux_V760g_x86_64
usbpid version: 2.3.0+42.Branch.master.Sha.3d9f5c5acc4e133ab8147684ad1463cbaec43240

Previously, I had been using:

Windows: Version 10.0.19044 Build 19044
WSL: 5.10.102.1-microsoft-standard-WSL2
Ubuntu release: 20.04
nrfjprog version: 10.15.2 external
JLinkARM.dll version: 7.60g
Installed JLink package name: JLink_Linux_V760g_x86_64
usbpid version: 2.1.0+14.Branch.master.Sha.0e5a276542476f24b2dc6de8d07c82c9fb8912c7

And the two sets of info for my colleagues (experiencing the same issue) are:

Windows: Microsoft Windows 10 Enterprise version 10.0.19044 Build 19044
WSL: 5.10.102.1 with WSL2
Ubuntu: Ubuntu-20.04
nrfjprog version: 10.16.0 external
JLinkARM.dll version: 7.68
Installed JLink package: JLink_Linux_V768_x86_64.tgz
usbipd: 
2.3.0+42.Branch.master.Sha.3d9f5c5acc4e133ab8147684ad1463cbaec43240

and

Windows: Microsoft Windows 10 Enterprise version 10.0.19044 Build 19044
WSL: 5.10.102.1 with WSL2
Linux pesk 5.10.102.1-microsoft-standard-WSL2 #1 SMP Wed Mar 2 00:30:59 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 20.04.4 LTS
nrfjprog version: 10.16.0 external
JLinkARM.dll version: 7.66a
Installed JLink package: JLink_Linux_V766a_x86_64.deb
usbipd: 
2.2.0+72.Branch.master.Sha.00b9e5ec41cb89af4ec628fe83625ad2a50d321d

The first step, directly after reboot, in an admin-level powershell terminal:

sc stop usbipd
usbipd server Logging:LogLevel:Default=Trace

This complains: usbipd: error: Another instance is already running.

So I kill usbipd directly from task manager, and then continue:

usbipd server Logging:LogLevel:Default=Trace

Logs begin to be output. I will attach a full copy of them from this point forward.

Next, in a non-admin powershell context:

usbipd list

This yields:

Connected:
BUSID  VID:PID    DEVICE                                                        STATE
4-1    1366:1055  JLink CDC UART Port (COM6), JLink CDC UART Port (COM5), J...  Shared

Persisted:
GUID                                  DEVICE
********-****-****-****-************  JLink CDC UART Port (COM21), JLink CDC UART Port (COM20),...i

(non-relevant entries removed)

So, I run:

usbipd wsl attach -b 4-1

(Note that I must at some point before this step open Ubuntu WSL)

Now, in WSL: lsusb yields:

Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 002: ID 1366:1055 SEGGER J-Link
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

and dmesg | grep tty yields:

[    0.074823] printk: console [tty0] enabled
[   21.736946] cdc_acm 1-1:1.0: ttyACM0: USB ACM device
[   21.738222] cdc_acm 1-1:1.2: ttyACM1: USB ACM device
[   21.739433] cdc_acm 1-1:1.4: ttyACM2: USB ACM device

Next, in linux, I run the following magic permissions commands. This has, for reasons unknown to me, always been required in order to get segger flashing working, even back when we were not having any problems:

sudo chmod -R a+rw /dev/bus/usb/*
sudo chmod -R a+rw /dev/ttyACM*

Now, I list SEGGER ids just to make sure everything is in order. nrfjprog --ids yields 960076031, as expected.

Next, I cd into my build directory, and run:

nrfjprog --program 'zephyr/merged.hex' --chiperase --verify -f NRF91 --snr 960076031 --log

Yielding:

Parsing image file.
Verifying programming.
ERROR: [  JLink] - Communication timed out: Requested 8196 bytes, received 0 bytes !
ERROR: [SeggerBackend] - JLinkARM.dll ReadMem returned error 1.
ERROR: [SeggerBackend] - JLinkARM.dll ReadMem returned error 1.
ERROR: [SeggerBackend] - JLinkARM.dll ReadMem returned error 1.
ERROR: [SeggerBackend] - JLinkARM.dll ReadMem returned error 1.
ERROR: [SeggerBackend] - JLinkARM.dll ReadMem returned error 1.
ERROR: [SeggerBackend] - JLinkARM.dll reported "-608525408", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported "-608525680", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported "-608525440", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported "-608525680", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported "-608525440", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported "-608525680", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported "-608525440", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported "-608525680", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported "-608525440", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported "-608525680", "An unknown error.".
ERROR: [  nRF91] - The read access failed due to memory protection.
ERROR: [  nRF91] - Failed to read from flash
ERROR: [  nRF91] - Flash verification failed.
ERROR: [  nRF91] - Failed while verifying file zephyr/merged.hex.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 3790.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4535.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 3790.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4535.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4244.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4292.
ERROR: [SeggerBackend] - JLinkARM.dll reported "-1", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4365.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4292.
ERROR: [SeggerBackend] - JLinkARM.dll reported "-1", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4365.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4292.
ERROR: [SeggerBackend] - JLinkARM.dll reported "-1", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4365.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4292.
ERROR: [SeggerBackend] - JLinkARM.dll reported "-1", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4365.
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4292.
ERROR: [SeggerBackend] - JLinkARM.dll reported "-1", "An unknown error.".
ERROR: [SeggerBackend] - JLinkARM.dll reported error -256 at line 4365.
ERROR: The JLinkARM DLL timed out while communicating to the J-Link probe.
ERROR: If the error condition persists, run the same command again with
ERROR: argument -- log, contact Nordic Semiconductor and provide the generated
ERROR: log.log file to them.

And now, if I try to run nrfjprog --ids the result is simply blank.

The output from usbipd is surprisingly short:

dbug: Microsoft.Extensions.Hosting.Internal.Host[1]
      Hosting starting
dbug: Usbipd.PcapNg[1000]
      usbipd:PcapNg:Path = ''
dbug: Usbipd.Server[1000]
      2.3.0+42.Branch.master.Sha.3d9f5c5acc4e133ab8147684ad1463cbaec43240
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Production
info: Microsoft.Hosting.Lifetime[0]
      Content root path: C:\Windows\system32
dbug: Microsoft.Extensions.Hosting.Internal.Host[2]
      Hosting started
dbug: Usbipd.Server[1000]
      new connection from 172.25.30.169
dbug: Usbipd.Server[1000]
      connection closed
dbug: Usbipd.Server[1000]
      new connection from 172.25.30.169
dbug: Usbipd.ConnectedClient[1000]
      Received opcode: OP_REQ_IMPORT
info: Usbipd.ConnectedClient[1]
      Client 172.25.30.169 claimed device at 4-1 (USB\VID_1366&PID_1055\000960076031).
dbug: Usbipd.AttachedClient[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32
dbug: Usbipd.AttachedClient[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32
dbug: Usbipd.AttachedClient[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32
dbug: Usbipd.AttachedClient[1000]
      Trapped SET_CONFIGURATION: 1
dbug: Usbipd.AttachedClient[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32
dbug: Usbipd.AttachedClient[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32
dbug: Usbipd.AttachedClient[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32
dbug: Usbipd.AttachedClient[1000]
      USBSUP_XFER_STALL -> EPIPE -> -32
dbug: Usbipd.ConnectedClient[1000]
      Unbind or unplug while attached
info: Usbipd.ConnectedClient[2]
      Client 172.25.30.169 released device at 4-1 (USB\VID_1366&PID_1055\000960076031).
dbug: Usbipd.Server[1000]
      connection closed

This actually seems to vary between runs, as though sometimes it gets further into the process than others. Though it is not clear why or how.

That's pretty much all the information I have on this. Any insight would be greatly appreciated. Thank you!

glarsennordic avatar Jul 16 '22 00:07 glarsennordic

Unbind or unplug while attached

This means exactly what it says. Considering that you did not specifically unbind the device, from the Windows point of view it is as if the device was unplugged (a so-called "surprise removal"). This can be caused by a device reset power, or maybe the device goes into a different mode and reports itself as a different device ID during/after flashing. In that case it is to Windows as if the original device is unplugged and then another device is plugged in.

In any case, when this happens, usbipd-win will close the connection to the USBIP client because it is told by Windows that the device is no longer present. This should also be visible in the Linux syslog, i.e. it is no longer attached to Linux. This will certainly lead to errors for any software that had a device handle open, since the device is logically no longer there.

Can you check in Windows device manager, or with a tool like usbview.exe (part of Windows SDK), if the device is indeed reporting a different device ID after the error?

dorssel avatar Jul 16 '22 00:07 dorssel

@glarsennordic Any news?

dorssel avatar Aug 23 '22 07:08 dorssel

@dorssel

Thanks for taking the time to respond, sorry for the delay in my reply.

USBView doesn't seem to list a de vice ID, so I will give the Companion Hub Symbolic Link Name, as that seems to be the closest thing to a device ID I can find in the listed information

Prior to attempting to flash, this reads:

USB#VID_2109&PID_0813#5&f1c342&0&14#{f18a0e88-c30c-11d0-8815-00a0c906bed8}

After attempting to flash:

USB#VID_2109&PID_0813#5&f1c342&0&14#{f18a0e88-c30c-11d0-8815-00a0c906bed8}

I will note that the USB device viewer deselected the device after error, so it is going offline in some regard briefly.

What is strange is that about a year ago, none of this was a problem. All of these tools worked just fine together.

glarsennordic avatar Aug 23 '22 16:08 glarsennordic

The code has changed quite a lot, especially between 1.x and 2.x, in detecting devices. That was necessary to support more device types, but for you it may have resulted in a regression. Can you test with the --auto-attach flag? This should re-attach the device after the brief detach due to the device resetting (which looks like an unplug/replug to Windows).

dorssel avatar Aug 23 '22 17:08 dorssel

Oh, neat, that flag seems useful! Using it (full command: usbipd wsl attach --busid 1-3 --auto-attach) gets me a bit closer, but it would seem the permissions hacks I am doing with

sudo chmod -R a+rw /dev/bus/usb/*
sudo chmod -R a+rw /dev/ttyACM*

are not persisting across the re-attach.

Do you have any idea why these permissions hacks are necessary in the first place? I suspect user error, but I have not been able to find a proper solution.

glarsennordic avatar Aug 23 '22 18:08 glarsennordic

I think you can solve that by getting udev running. That's the service that normally takes care of such things, but it isn't setup by WSL. Search through the issues (also the closed ones) on how to get it up and running. It is also mentioned here: https://github.com/dorssel/usbipd-win/wiki/WSL-support#udev. I don't know the details though...

dorssel avatar Aug 24 '22 09:08 dorssel

PS: see also https://github.com/dorssel/usbipd-win/issues/321#issuecomment-1180727251. The mentioned wiki-update seems like the way to go...

dorssel avatar Aug 24 '22 09:08 dorssel

@dorssel Hi, another update:

I've got appropriate udev rules working. However, I'm seeing some strange behavior:

When running with the --auto-attach, I see a detach/reattach log which is pretty useful.

Before today, the device would always detach briefly, and then reattach in the log.

Now that udev rules are set up, the device has the correct permissions after re-attaching, but I've begun seeing some new behavior:

  1. Sometimes the device never detaches at all. Flashing and verification simply completes as if the device had never been disconnected or reset. This results in behavior identical to what I used to see around a year ago
  2. Sometimes the device detaches, and then fails to reattach until I power-cycle it. This results in output looking like:
Detached
usbip: error: Attach Request for 1-3 failed - Device busy (exported)
usbip: error: Attach Request for 1-3 failed - Device not found
  1. Sometimes the device does not reattach quickly enough for J-Link to successfully continue its job.

It is worth noting that I have not once seen J-Link succeed when an reattach occurred. My only successes have been when the device mysteriously does not detach at all.

glarsennordic avatar Aug 30 '22 21:08 glarsennordic

Can you reproduce case 2: "Sometimes the device detaches, and then fails to reattach until I power-cycle it". And then not powercycle. Instead, can you find the device in Windows device manager? Is it missing? Or is it showing with a warning? If the latter, what error code?

dorssel avatar Aug 30 '22 21:08 dorssel

Of course, I was frequently reproducing case 2 until you asked me to!

Finally got it again.

Both USBView and Device Manager show the "J-Link driver" on port_#0003.Hub_#0001, which I am presuming corresponds to the 1-3 I have to enter in usbipd.

Device status in DeviceManager states: "This device is working properly."

I don't see anything warning-like anywhere here in Device Manager.

I'll stick around for at least the next 10 minutes, in case there are any screenshots or specific things you'd like me to look at for this particular reproduction.

More observations:

  1. The device occasionally detaches twice during a single flash
  2. Whenever flashing succeeds, it seems to continue to succeed on subsequent attempts until I power cycle

glarsennordic avatar Aug 30 '22 21:08 glarsennordic

Can you verify that the Vendor and Pid are still what you expect? Those are listed in the details tab of the device properties in device manager.

What is weird is that the device is present, without error, and usbipd does not think it is...

Do you run the usbipd server with full debug logging when this happens? Any weird logs there?

dorssel avatar Aug 30 '22 21:08 dorssel

Also, in case 2, what is the output of usbipd list. Does it show the device in the top list (present) or bottom list (persisted)?

dorssel avatar Aug 30 '22 21:08 dorssel

I am pretty sure I have the device plugged into a different USB port, now, so the symbolic link name is slightly different: USB#ROOT_HUB30#4&c852270&0&0#{f18a0e88-c30c-11d0-8815-00a0c906bed8}

But that f18a0e88-c30c-11d0-8815-00a0c906bed8 matches up with before, so I'd say it is truly the same device.

I am not running in full debug mode right now, sorry about that. I can try to reproduce again in that mode. I will hold off on that though until you give the goahead since it will require me to start from scratch, and perhaps you would like me to check something else first

Opening a second adminstrator powershell session and running usbipd list (since the current one is blocked running in --auto-attach mode) yields:

1-3    1366:0101  J-Link driver                                                 Not shared

Maybe this is a bug in the re-attach logic?

glarsennordic avatar Aug 30 '22 21:08 glarsennordic

"Not shared" is what worries me... Is this usbipd list or usbipd wsl list? I need the former. If it already is the former, then it looks to me that the device reports as "different" somehow, because it should have been "Shared". Is it also listed as persisted perhaps? Is "1366:0101" what it is supposed to be?

dorssel avatar Aug 30 '22 21:08 dorssel

"Not found" is consistent with "Not shared". The problem really is that for some reason usbipd thinks this is a different device that is not shared. But if that is the case, it should report the "original" device as persisted...

dorssel avatar Aug 30 '22 21:08 dorssel

This is usbipd list

Regrettably, I haven't been paying attention to the VID:PID listed by usbipd list, so I don't know whether it has changed. I haven't seen any similar numbers listed in USBView at any point either.

glarsennordic avatar Aug 30 '22 21:08 glarsennordic

OK, now you can power cycle the device. And then do another usbipd list again. Is it now showing as shared? And is the VID:PID still 1366:0101?

dorssel avatar Aug 30 '22 21:08 dorssel

Upon power-cycling, the following two lines were produced in the attach powershell window:

usbip: error: Attach Request for 1-3 failed - Device busy (exported)
Attached

I accidentally killed the process when trying to copy and paste those lines. I did not manage to check usbipd list before then.

But, running usbipd list now I see

1-3 1366:1055 JLink CDC UART Port (COM24), JLink CDC UART Port (COM23),... Attached

Which is quite different.

glarsennordic avatar Aug 30 '22 21:08 glarsennordic

Aha! So ... the device is reporting as a different device upon power-cycle. What you can try is the following:

Once you reach this sittuation again (case 2). Do another usbipd bind -b 1-3. This will also share "the other identity". Fortunately, the re-attach script does not care. It will lock on to 1-3, no matter what. But for usbipd, both devices need to be shared.

dorssel avatar Aug 30 '22 21:08 dorssel

Do you need to access the device from Windows as well? If not, I recommend the --force flag for both binds. This will tell Windows to never user the device (it will permanently load the proxy driver). That will help with the timing a lot.

Then again, if you do need to access the device from Windows, then this is not an option.

dorssel avatar Aug 30 '22 21:08 dorssel

I certainly do not need to use the device windows-side! Though I am using usbipd wsl attach not usbipd bind. It's been a while since I read the source code, but I thought that usbipd wsl attach enabled force by default, right? Or am I mistaken, and you suggest using --force directly during usbipd wsl attach?

I will try as you suggest, in the hopes that it will generate useful debugging information, although I will note that it is strange that the device sometimes does not appear to detach according to usbipd at all, and furthermore I don't think I will ever be able to run usbipd bind -b 1-3 quickly enough for flashing to work when 2) occurs.

In fact, even when reattach succeeds, the flash fails unfortunately

glarsennordic avatar Aug 30 '22 21:08 glarsennordic

wsl subcommands are macros. wsl attach does a normal bind, if it was never done before, followed by an attach from the client.

To use --force you must do so by manually binding (once) from an administrator prompt. This is persisted across reboots, so it really is only once. This is preferred if you really do not need the device Windows-side. I recommend it.

Furthermore: to usbipd it really looks like the old device 1366:1055 is unplugged, and then a different device 1366:0101 is plugged in. This will always lead to a detach and require a re-attach. This is how Windows works, there is nothing that can be done about that. I guess the JLink is flaky, and sometimes goes into VCOM mode, and sometimes it doesn't.

If you don't need VCOM mode, it looks like it can be disabled altogether, see https://wiki.segger.com/Using_J-Link_VCOM_functionality.

dorssel avatar Aug 30 '22 21:08 dorssel

BTW: regarding your previous comment. You misunderstand the meaning of "bind". Binding needs to be done only once (per device ID). It persists across reboots. After that, the client can attach. So, with my proposed bind --force, you will still use the same auto-attach script. The only difference will be that Windows won't have to switch drivers all the time (making attach a lot faster).

dorssel avatar Aug 30 '22 22:08 dorssel

Right, thank you for the assistance! I will attempt this at some point in the near future, but for now I have to move on

I think I miscommunicated; I will need to use the device from Windows sometimes, just never at the same time I am using it on WSL. My bad, I misunderstood what you meant.

As such, I am assuming an unbind call will undo the forced bind should I need to switch back to windows?

Also, as for VCOM, how did you determine that the cause is going into VCOM mode? I believe you, but it makes me think there is something here that you spotted which I did not.

glarsennordic avatar Aug 30 '22 22:08 glarsennordic

Yes, unbind will also undo --force. But you will have to do that manually as well, because once you use --force Windows cannot use the device until you manually unbind.

1366:1055 is the device ID for JLink in VCOM mode. 1366:0101 is the device ID for JLink in raw mode.

I guess you have a Segger PLUS (or something), since not all Seggers have a VCOM mode at all. Most people use JLink only in raw mode. VCOM mode can be disabled (I believe). This would stop your JLink from switching between the two modes...

dorssel avatar Aug 30 '22 22:08 dorssel

The real problem here is that your JLink is switching between modes, which to Windows looks like unplugging one device and plugging in a totally different device. This will lead to a detach and require a new attach.

dorssel avatar Aug 30 '22 22:08 dorssel

I have a hunch that VCOM mode is the default, and our flashing script / JLink is briefly switching out of VCOM mode in order to perform flashing, and then switching back into VCOM mode when done. At least, during a successful run.

If this happens quickly enough, is it possible that usbipd / the auto-attach script might not even notice? Then the explanation for detach would be that if the process fails, it aborts before switching back into VCOM mode, hence why it never reattaches

In order for that hunch to be correct, though, somehow J-Link within WSL is managing to use the non-VCOM-mode device without that device (mode) specifically being attached or bound via usbipd (at least, during a successful run). Is that at all possible to your knowledge?

I'll test this hunch when I return to the problem by checking the PID before flash attempt

Thank you again for all of the help

glarsennordic avatar Aug 30 '22 22:08 glarsennordic

I don't think that is possible. When the JLink switches mode, Windows will see that as an unplug/re-plug. This will immediately detach the device. No further commands to the "old" attach will be possible.

You are right, though. VCOM is the default mode (after power cycle). This is also the initial bind/attach that you used. Somehow, the JLink is switched to raw mode. I believe this can (should?) only happen if it gets told to from within WSL, perhaps even the flash script itself. Maybe it is timing. If all goes well, the flash succeeds in time and the JLink will stay in VCOM mode. But sometimes maybe the flash program thinks that things went wrong (or too slow). And maybe it tries to recover by putting JLink in raw mode. On real hardware, this would be fine. But for WSL it just means that the JLink will be detached altogether and is now "gone".

This may be (?!?) solved by also sharing the raw device ID. This will allow the auto-attach script to (hopefully in time) attach the "other" Jlink device (the raw mode). And perhaps that will allow the flash program to indeed recover. If this is the case, it will probably switch back to VCOM mode, after which yet again another detach/attach cycle will kick in...

dorssel avatar Aug 30 '22 22:08 dorssel

Any news on this issue?

dorssel avatar Sep 28 '22 23:09 dorssel

Very sorry, I've been too busy as of late to attend to this

I think it is very likely something to do with our tools. Feel free to close this issue, and I can always reopen it when I get a chance to reach out to the right people

glarsennordic avatar Oct 04 '22 22:10 glarsennordic