labgrid
labgrid copied to clipboard
Flashing with SDWire racy
Using slower SD cards I see kernel errors in dmesg at the very end of flashing:
[86887.602403] scsi_io_completion_action: 4 callbacks suppressed
[86887.602409] sd 7:0:0:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[86887.602412] sd 7:0:0:0: [sde] tag#0 Sense Key : Not Ready [current]
[86887.602415] sd 7:0:0:0: [sde] tag#0 Add. Sense: Medium not present
[86887.602417] sd 7:0:0:0: [sde] tag#0 CDB: Read(10) 28 00 00 12 87 f0 00 00 08 00
[86887.602418] print_req_error: 160 callbacks suppressed
[86887.602420] blk_update_request: I/O error, dev sde, sector 1214448 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[86887.602908] sd 7:0:0:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[86887.602911] sd 7:0:0:0: [sde] tag#0 Sense Key : Not Ready [current]
[86887.602913] sd 7:0:0:0: [sde] tag#0 Add. Sense: Medium not present
[86887.602914] sd 7:0:0:0: [sde] tag#0 CDB: Read(10) 28 00 00 01 08 40 00 00 08 00
[86887.602916] blk_update_request: I/O error, dev sde, sector 67648 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[86887.603343] sd 7:0:0:0: [sde] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE cmd_age=0s
[86887.603345] sd 7:0:0:0: [sde] tag#0 Sense Key : Not Ready [current]
[86887.603346] sd 7:0:0:0: [sde] tag#0 Add. Sense: Medium not present
[86887.603347] sd 7:0:0:0: [sde] tag#0 CDB: Read(10) 28 00 00 12 87 f0 00 00 08 00
[86887.603348] blk_update_request: I/O error, dev sde, sector 1214448 op 0x0:(READ) flags 0x0 phys_seg 8 prio class 0
Adding a "time.sleep(5) between writing the image and switch to dut does help, but is of course not ideal.
Interesting is the Medium not present hint. I guess the writing just did not complete (on some level). Probably a sync or even better a proper eject is required? Is there an option in Labgrid to properly eject a USB mass storage device after flashing?
@JPEWdev FYI.
We already use oflag=direct for dd but this does not gurantee that data is synced to the underlying storage device. AFAIR oflag=sync made the transfer really slow.
As the messages refer to read requests, I suspect that this is something that tries to read from the device after dd has closed it. You could use blktrace to check which program is responsible.
Odd, I don't think I've ever seen this, but I also use bmaptool almost exclusively
I just ran a test with DD and didn't see any errors
FWIW, this is still the case for me using SDWire 1.4 devices, on Ubuntu 21.04 (Linux 5.11) using Labgrid master as of today.
Weird. Labgrid invokes dd with conv=fdatasync which should ensure that all the data is written before dd exits, so I'm not sure whats going on. You can strengthening it to conv=fsync; I'm not sure how that could make a difference in this case, but it might be worth a try
FWIW, this is still the case for me using SDWire 1.4 devices, on Ubuntu 21.04 (Linux 5.11) using Labgrid master as of today.
Are you still seeing the same error messages as in the original report? I'm asking because these are read errors and labgrid is only writing to the device. Please use blktrace to check which process is causing these requests.
@jluebbe yes the same errors. Hm, this got me thinking, there is udisks running on that host, and I remember that on my Desktop udisks also rereads partition tables etc. when dd'ing a whole disk.
Using an udisk rule like this seems not to help though:
SUBSYSTEMS=="usb", ENV{ID_MODEL}=="SDWir_HS-SD_MMC", ENV{UDISKS_IGNORE}="1"
I then checked using btrace, there are quite some processes showing up after write: systemd-udevd, probe-bcache and udisksd. Not sure how those all get triggered, but I am guessing udev? If we could somehow prevent udev events from triggering I think then an immediate mux would be safer.
Do you see any indication that some data that should be written is actually missing from the SD after switching it to the target?
If not, then I don't think there is anything labgrid could do to keep other processes from trying to read from the card after the image is written and then get interrupted by switching it away.
Since other processes seem to read from the SD card in the SDWire, this is no labgrid issue. Closing this.