labgrid icon indicating copy to clipboard operation
labgrid copied to clipboard

Flashing with SDWire racy

Open agners opened this issue 4 years ago • 10 comments

Using slower SD cards I see kernel errors in dmesg at the very end of flashing:

[86887.602403] scsi_io_completion_action: 4 callbacks suppressed
[86887.602409] sd 7:0:0:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[86887.602412] sd 7:0:0:0: [sde] tag#0 Sense Key : Not Ready [current] 
[86887.602415] sd 7:0:0:0: [sde] tag#0 Add. Sense: Medium not present
[86887.602417] sd 7:0:0:0: [sde] tag#0 CDB: Read(10) 28 00 00 12 87 f0 00 00 08 00
[86887.602418] print_req_error: 160 callbacks suppressed
[86887.602420] blk_update_request: I/O error, dev sde, sector 1214448 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[86887.602908] sd 7:0:0:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[86887.602911] sd 7:0:0:0: [sde] tag#0 Sense Key : Not Ready [current] 
[86887.602913] sd 7:0:0:0: [sde] tag#0 Add. Sense: Medium not present
[86887.602914] sd 7:0:0:0: [sde] tag#0 CDB: Read(10) 28 00 00 01 08 40 00 00 08 00
[86887.602916] blk_update_request: I/O error, dev sde, sector 67648 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[86887.603343] sd 7:0:0:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[86887.603345] sd 7:0:0:0: [sde] tag#0 Sense Key : Not Ready [current] 
[86887.603346] sd 7:0:0:0: [sde] tag#0 Add. Sense: Medium not present
[86887.603347] sd 7:0:0:0: [sde] tag#0 CDB: Read(10) 28 00 00 12 87 f0 00 00 08 00
[86887.603348] blk_update_request: I/O error, dev sde, sector 1214448 op 0x0:(READ) flags 0x0 phys_seg 8 prio class 0

Adding a "time.sleep(5) between writing the image and switch to dut does help, but is of course not ideal.

Interesting is the Medium not present hint. I guess the writing just did not complete (on some level). Probably a sync or even better a proper eject is required? Is there an option in Labgrid to properly eject a USB mass storage device after flashing?

agners avatar Dec 10 '20 12:12 agners

@JPEWdev FYI.

agners avatar Dec 10 '20 12:12 agners

We already use oflag=direct for dd but this does not gurantee that data is synced to the underlying storage device. AFAIR oflag=sync made the transfer really slow.

Emantor avatar Dec 10 '20 13:12 Emantor

As the messages refer to read requests, I suspect that this is something that tries to read from the device after dd has closed it. You could use blktrace to check which program is responsible.

jluebbe avatar Dec 10 '20 13:12 jluebbe

Odd, I don't think I've ever seen this, but I also use bmaptool almost exclusively

JPEWdev avatar Dec 10 '20 14:12 JPEWdev

I just ran a test with DD and didn't see any errors

JPEWdev avatar Dec 10 '20 14:12 JPEWdev

FWIW, this is still the case for me using SDWire 1.4 devices, on Ubuntu 21.04 (Linux 5.11) using Labgrid master as of today.

agners avatar May 20 '21 19:05 agners

Weird. Labgrid invokes dd with conv=fdatasync which should ensure that all the data is written before dd exits, so I'm not sure whats going on. You can strengthening it to conv=fsync; I'm not sure how that could make a difference in this case, but it might be worth a try

JPEWdev avatar May 20 '21 19:05 JPEWdev

FWIW, this is still the case for me using SDWire 1.4 devices, on Ubuntu 21.04 (Linux 5.11) using Labgrid master as of today.

Are you still seeing the same error messages as in the original report? I'm asking because these are read errors and labgrid is only writing to the device. Please use blktrace to check which process is causing these requests.

jluebbe avatar May 21 '21 06:05 jluebbe

@jluebbe yes the same errors. Hm, this got me thinking, there is udisks running on that host, and I remember that on my Desktop udisks also rereads partition tables etc. when dd'ing a whole disk.

Using an udisk rule like this seems not to help though:

SUBSYSTEMS=="usb", ENV{ID_MODEL}=="SDWir_HS-SD_MMC", ENV{UDISKS_IGNORE}="1"

I then checked using btrace, there are quite some processes showing up after write: systemd-udevd, probe-bcache and udisksd. Not sure how those all get triggered, but I am guessing udev? If we could somehow prevent udev events from triggering I think then an immediate mux would be safer.

agners avatar May 21 '21 16:05 agners

Do you see any indication that some data that should be written is actually missing from the SD after switching it to the target?

If not, then I don't think there is anything labgrid could do to keep other processes from trying to read from the card after the image is written and then get interrupted by switching it away.

jluebbe avatar May 21 '21 18:05 jluebbe

Since other processes seem to read from the SD card in the SDWire, this is no labgrid issue. Closing this.

Bastian-Krause avatar Jul 23 '23 10:07 Bastian-Krause