mhvtl icon indicating copy to clipboard operation
mhvtl copied to clipboard

mhvtl volumes losing their serial number?

Open rohr22 opened this issue 2 years ago • 16 comments

On our test and development systems, we use udev rules to identify the mhvtl volumes with a udev rule file like this:

KERNEL=="st*", SUBSYSTEM=="scsi_tape", ENV{ID_SERIAL}=="350223344ab000100", SYMLINK+="abcd/mhvtl_lto-6_01"
KERNEL=="st*", SUBSYSTEM=="scsi_tape", ENV{ID_SERIAL}=="350223344ab000200", SYMLINK+="abcd/mhvtl_lto-6_02"
KERNEL=="st*", SUBSYSTEM=="scsi_tape", ENV{ID_SERIAL}=="350223344ab000300", SYMLINK+="abcd/mhvtl_lto-6_03"
KERNEL=="st*", SUBSYSTEM=="scsi_tape", ENV{ID_SERIAL}=="350223344ab000400", SYMLINK+="abcd/mhvtl_lto-6_04"
KERNEL=="st*", SUBSYSTEM=="scsi_tape", ENV{ID_SERIAL}=="350223344ab000500", SYMLINK+="abcd/mhvtl_lto-6_05"
KERNEL=="st*", SUBSYSTEM=="scsi_tape", ENV{ID_SERIAL}=="350223344ab000600", SYMLINK+="abcd/mhvtl_lto-6_06"
KERNEL=="st*", SUBSYSTEM=="scsi_tape", ENV{ID_SERIAL}=="350223344ab000700", SYMLINK+="abcd/mhvtl_lto-6_07"
KERNEL=="st*", SUBSYSTEM=="scsi_tape", ENV{ID_SERIAL}=="350223344ab000800", SYMLINK+="abcd/mhvtl_lto-6_08"
KERNEL=="st*", SUBSYSTEM=="scsi_tape", ENV{ID_SERIAL}=="350223344ab000900", SYMLINK+="abcd/mhvtl_lto-6_09"
KERNEL=="st*", SUBSYSTEM=="scsi_tape", ENV{ID_SERIAL}=="350223344ab001000", SYMLINK+="abcd/mhvtl_lto-6_10"

where abcd = a short name for our project. After we run udevadm trigger, we can list the tape volumes like this:

# ls -la /dev/abcd | grep mhvtl However, I am finding that occasionally the mhvtl files listed under /dev/abcd are missing some entries, for example:

# ls -la /dev/abcd | grep mhvtl
lrwxrwxrwx  1 root root    6 Jun 25 03:23 mhvtl_lto-6_02 -> ../st8
lrwxrwxrwx  1 root root    6 Jun 25 03:22 mhvtl_lto-6_03 -> ../st4
lrwxrwxrwx  1 root root    6 Jun 25 03:23 mhvtl_lto-6_04 -> ../st6
lrwxrwxrwx  1 root root    6 Jun 25 03:22 mhvtl_lto-6_06 -> ../st5
lrwxrwxrwx  1 root root    6 Jun 25 03:22 mhvtl_lto-6_07 -> ../st2
lrwxrwxrwx  1 root root    6 Jun 25 03:22 mhvtl_lto-6_08 -> ../st3
lrwxrwxrwx  1 root root    6 Jun 25 03:23 mhvtl_lto-6_09 -> ../st1
lrwxrwxrwx  1 root root    6 Jun 25 03:23 mhvtl_lto-6_10 -> ../st0

Where are the links for mhvtl_lto-6_01 and mhvtl_lto-6_05?

# lsscsi |grep TD6
[2:0:1:0]    tape    IBM      ULT3580-TD6      0107  /dev/st9 
[2:0:2:0]    tape    IBM      ULT3580-TD6      0107  /dev/st8 
[2:0:3:0]    tape    IBM      ULT3580-TD6      0107  /dev/st4 
[2:0:4:0]    tape    IBM      ULT3580-TD6      0107  /dev/st6 
[2:0:5:0]    tape    IBM      ULT3580-TD6      0107  /dev/st7 
[2:0:6:0]    tape    IBM      ULT3580-TD6      0107  /dev/st5 
[2:0:7:0]    tape    IBM      ULT3580-TD6      0107  /dev/st2 
[2:0:8:0]    tape    IBM      ULT3580-TD6      0107  /dev/st3 
[2:0:9:0]    tape    IBM      ULT3580-TD6      0107  /dev/st1 
[2:0:10:0]   tape    IBM      ULT3580-TD6      0107  /dev/st0 

# for f in $(lsscsi |grep TD6 | awk '{print $NF}'); do
 echo "Processing $f"; udevadm info --name=$f --query=all |grep "ID_SERIAL="
done
Processing /dev/st9
Processing /dev/st8
E: ID_SERIAL=350223344ab000200
Processing /dev/st4
E: ID_SERIAL=350223344ab000300
Processing /dev/st6
E: ID_SERIAL=350223344ab000400
Processing /dev/st7
Processing /dev/st5
E: ID_SERIAL=350223344ab000600
Processing /dev/st2
E: ID_SERIAL=350223344ab000700
Processing /dev/st3
E: ID_SERIAL=350223344ab000800
Processing /dev/st1
E: ID_SERIAL=350223344ab000900
Processing /dev/st0
E: ID_SERIAL=350223344ab001000

Here is the output from running udevadm info on /dev/st9 which was missing the ID_SERIAL value:

udevadm info ---name=/dev/st9 --query=all
P: /devices/mhvtl_pseudo/adapter0/host2/target2:0:1/2:0:1:0/scsi_tape/st9
N: st9
E: DEVNAME=/dev/st9
E: DEVPATH=/devices/mhvtl_pseudo/adapter0/host2/target2:0:1/2:0:1:0/scsi_tape/st9
E: ID_BUS=scsi
E: ID_SCSI=1
E: MAJOR=9
E: MINOR=9
E: SUBSYSTEM=scsi_tape
E: USEC_INITIALIZED=1615028020641

One without the missing value for ID_SERIAL is shown:

# udevadm info --name=/dev/st8 --query=all
P: /devices/mhvtl_pseudo/adapter0/host2/target2:0:2/2:0:2:0/scsi_tape/st8
N: st8
S: hpss/mhvtl_lto-6_02
S: tape/by-id/scsi-350223344ab000200
E: DEVLINKS=/dev/hpss/mhvtl_lto-6_02 /dev/tape/by-id/scsi-350223344ab000200
E: DEVNAME=/dev/st8
E: DEVPATH=/devices/mhvtl_pseudo/adapter0/host2/target2:0:2/2:0:2:0/scsi_tape/st8
E: ID_BUS=scsi
E: ID_MODEL=ULT3580-TD6
E: ID_MODEL_ENC=ULT3580-TD6\x20\x20\x20\x20\x20
E: ID_REVISION=0107
E: ID_SCSI=1
E: ID_SCSI_SERIAL=XYZZY_A2
E: ID_SERIAL=350223344ab000200
E: ID_SERIAL_SHORT=50223344ab000200
E: ID_TYPE=tape
E: ID_VENDOR=IBM
E: ID_VENDOR_ENC=IBM\x20\x20\x20\x20\x20
E: ID_WWN=0x50223344ab000200
E: ID_WWN_WITH_EXTENSION=0x50223344ab000200
E: MAJOR=9
E: MINOR=8
E: SUBSYSTEM=scsi_tape
E: USEC_INITIALIZED=1615027810565

Our application tries to access the tape volume with one of the udev rules (mhvtl_lto-6_01 or mhvtl_lto-6_05 in the above example.) When the udev rule is lost, our application has failures.

What causes the mhvtl scsi device to lose its information such as the serial number and the model number? Is this a bug in the mhvtl code or an issue somewhere else?

I looked at the /var/log/messages file and the last entries with st9 were:

Jun 25 03:23:02 lews kernel: st 2:0:1:0: Attached scsi tape st9
Jun 25 03:23:02 lews kernel: st 2:0:1:0: st9: try direct i/o: yes (alignment 4 B)

And here are the entries just before our application reported an issue:

Jun 25 03:28:57 lews /usr/bin/vtltape[4034858]: ssc_tur(): Test Unit Ready (2922802) ** : No, No tape loaded
Jun 25 03:28:57 lews /usr/bin/vtltape[4034858]: completeSCSICommand(): s/n: (2922802), sz: 0, sam_status: 2 [02 3a 00]
Jun 25 03:28:57 lews /usr/bin/vtltape[4034858]: CDB (2922803) (delay 805): 12 00 00 00 60 00
Jun 25 03:28:57 lews /usr/bin/vtltape[4034858]: spc_inquiry(): INQUIRY ** (2922803)
Jun 25 03:28:57 lews /usr/bin/vtltape[4034858]: completeSCSICommand(): OP s/n: (2922803), sz: 64, sam_status: 0
Jun 25 03:28:57 lews /usr/bin/vtltape[4034858]: CDB (2922804) (delay 405): 00 00 00 00 00 00
Jun 25 03:28:57 lews /usr/bin/vtltape[4034858]: return_sense(): [Key/ASC/ASCQ] [02 3a 00]
Jun 25 03:28:57 lews /usr/bin/vtltape[4034858]: ssc_tur(): Test Unit Ready (2922804) ** : No, No tape loaded
Jun 25 03:28:57 lews /usr/bin/vtltape[4034858]: completeSCSICommand(): s/n: (2922804), sz: 0, sam_status: 2 [02 3a 00]
Jun 25 03:28:57 lews /usr/bin/vtltape[4034858]: CDB (2922805) (delay 405): 4d 00 51 00 00 00 00 00 0c 00
Jun 25 03:28:57 lews /usr/bin/vtltape[4034858]: ssc_log_sense(): LOG SENSE (2922805) ** :  Unknown code: 0x11
Jun 25 03:28:57 lews /usr/bin/vtltape[4034858]: return_sense(): [Key/ASC/ASCQ] [05 24 00] 0xc0 0002
Jun 25 03:28:57 lews /usr/bin/vtltape[4034858]: completeSCSICommand(): s/n: (2922805), sz: 0, sam_status: 2 [05 24 00]

Thank you, Peter

rohr22 avatar Jun 27 '22 21:06 rohr22

I got the mhvtl volumes for mhvtl_lto-6_01 and mhvtl_lto-6_05 to come back by running the udevadm trigger command, but why do I need to do that?

I found the following in the dmesg output shortly before out application reported an issue:

[2022-06-25 03:22:58] scsi 2:0:10:0: Sequential-Access IBM      ULT3580-TD6      0107 PQ: 0 ANSI: 5
[2022-06-25 03:22:58] st 2:0:10:0: Attached scsi tape st0
[2022-06-25 03:22:58] st 2:0:10:0: st0: try direct i/o: yes (alignment 4 B)
[2022-06-25 03:22:58] st 2:0:10:0: Attached scsi generic sg13 type 1
[2022-06-25 03:22:58] scsi 2:0:9:0: Sequential-Access IBM      ULT3580-TD6      0107 PQ: 0 ANSI: 5
[2022-06-25 03:22:58] st 2:0:9:0: Attached scsi tape st1
[2022-06-25 03:22:58] st 2:0:9:0: st1: try direct i/o: yes (alignment 4 B)
[2022-06-25 03:22:58] st 2:0:9:0: Attached scsi generic sg14 type 1
[2022-06-25 03:22:58] scsi 2:0:7:0: Sequential-Access IBM      ULT3580-TD6      0107 PQ: 0 ANSI: 5
[2022-06-25 03:22:58] st 2:0:7:0: Attached scsi tape st2
[2022-06-25 03:22:58] st 2:0:7:0: st2: try direct i/o: yes (alignment 4 B)
[2022-06-25 03:22:58] st 2:0:7:0: Attached scsi generic sg15 type 1
[2022-06-25 03:22:58] scsi 2:0:8:0: Sequential-Access IBM      ULT3580-TD6      0107 PQ: 0 ANSI: 5
[2022-06-25 03:22:58] st 2:0:8:0: Attached scsi tape st3
[2022-06-25 03:22:58] st 2:0:8:0: st3: try direct i/o: yes (alignment 4 B)
[2022-06-25 03:22:58] st 2:0:8:0: Attached scsi generic sg16 type 1
[2022-06-25 03:22:58] scsi 2:0:3:0: Sequential-Access IBM      ULT3580-TD6      0107 PQ: 0 ANSI: 5
[2022-06-25 03:22:59] st 2:0:3:0: Attached scsi tape st4
[2022-06-25 03:22:59] st 2:0:3:0: st4: try direct i/o: yes (alignment 4 B)
[2022-06-25 03:22:59] st 2:0:3:0: Attached scsi generic sg17 type 1
[2022-06-25 03:22:59] scsi 2:0:6:0: Sequential-Access IBM      ULT3580-TD6      0107 PQ: 0 ANSI: 5
[2022-06-25 03:22:59] st 2:0:6:0: Attached scsi tape st5
[2022-06-25 03:22:59] st 2:0:6:0: st5: try direct i/o: yes (alignment 4 B)
[2022-06-25 03:22:59] st 2:0:6:0: Attached scsi generic sg18 type 1
[2022-06-25 03:22:59] scsi 2:0:4:0: Sequential-Access IBM      ULT3580-TD6      0107 PQ: 0 ANSI: 5
[2022-06-25 03:22:59] synth uevent: /devices/vio: failed to send uevent
[2022-06-25 03:22:59] st 2:0:4:0: Attached scsi tape st6
[2022-06-25 03:22:59] st 2:0:4:0: st6: try direct i/o: yes (alignment 4 B)
[2022-06-25 03:22:59] synth uevent: /devices/vio/4000: failed to send uevent
[2022-06-25 03:22:59] vio 4000: uevent: failed to send synthetic uevent
[2022-06-25 03:22:59] synth uevent: /devices/vio/4001: failed to send uevent
[2022-06-25 03:22:59] vio 4001: uevent: failed to send synthetic uevent
[2022-06-25 03:22:59] synth uevent: /devices/vio/4002: failed to send uevent
[2022-06-25 03:22:59] vio 4002: uevent: failed to send synthetic uevent
[2022-06-25 03:22:59] synth uevent: /devices/vio/4004: failed to send uevent
[2022-06-25 03:22:59] vio 4004: uevent: failed to send synthetic uevent
[2022-06-25 03:22:59] st 2:0:4:0: Attached scsi generic sg19 type 1
[2022-06-25 03:23:00] scsi 2:0:5:0: Sequential-Access IBM      ULT3580-TD6      0107 PQ: 0 ANSI: 5
[2022-06-25 03:23:01] st 2:0:5:0: Attached scsi tape st7
[2022-06-25 03:23:01] st 2:0:5:0: st7: try direct i/o: yes (alignment 4 B)
[2022-06-25 03:23:01] st 2:0:5:0: Attached scsi generic sg20 type 1
[2022-06-25 03:23:01] scsi 2:0:2:0: Sequential-Access IBM      ULT3580-TD6      0107 PQ: 0 ANSI: 5
[2022-06-25 03:23:01] st 2:0:2:0: Attached scsi tape st8
[2022-06-25 03:23:01] st 2:0:2:0: st8: try direct i/o: yes (alignment 4 B)
[2022-06-25 03:23:01] st 2:0:2:0: Attached scsi generic sg21 type 1
[2022-06-25 03:23:01] scsi 2:0:1:0: Sequential-Access IBM      ULT3580-TD6      0107 PQ: 0 ANSI: 5
[2022-06-25 03:23:02] st 2:0:1:0: Attached scsi tape st9
[2022-06-25 03:23:02] st 2:0:1:0: st9: try direct i/o: yes (alignment 4 B)
[2022-06-25 03:23:02] st 2:0:1:0: Attached scsi generic sg22 type 1
[2022-06-25 03:23:02] scsi 2:0:0:0: Medium Changer    IBM      3584             0107 PQ: 0 ANSI: 5
[2022-06-25 03:23:02] ch 2:0:0:0: Power-on or device reset occurred
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] type #1 (mt): 0x2f0+1 [medium transport]
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] type #2 (st): 0x400+25 [storage]
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] type #3 (ie): 0x300+4 [import/export]
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] type #4 (dt): 0x1+10 [data transfer]
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] dt 0x1:
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] ID/LUN unknown
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] dt 0x2:
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] ID/LUN unknown
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] dt 0x3:
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] ID/LUN unknown
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] dt 0x4:
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] ID/LUN unknown
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] dt 0x5:
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] ID/LUN unknown
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] dt 0x6:
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] ID/LUN unknown
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] dt 0x7:
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] ID/LUN unknown
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] dt 0x8:
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] ID/LUN unknown
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] dt 0x9:
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] ID/LUN unknown
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] dt 0xa:
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] ID/LUN unknown
[2022-06-25 03:23:02] ch 2:0:0:0: [ch0] INITIALIZE ELEMENT STATUS, may take some time ...
[2022-06-25 03:23:03] ch 2:0:0:0: [ch0] ... finished
[2022-06-25 03:23:03] ch 2:0:0:0: Attached scsi changer ch0
[2022-06-25 03:23:03] ch 2:0:0:0: Attached scsi generic sg23 type 8
[2022-06-25 03:28:55] st 2:0:10:0: Power-on or device reset occurred
[2022-06-25 03:28:55] st 2:0:9:0: Power-on or device reset occurred
[2022-06-25 03:28:56] st 2:0:8:0: Power-on or device reset occurred
[2022-06-25 03:28:56] st 2:0:7:0: Power-on or device reset occurred
[2022-06-25 03:28:56] st 2:0:6:0: Power-on or device reset occurred

rohr22 avatar Jun 27 '22 22:06 rohr22

Thanks for the bug report.

I'm pretty sure the issue is with mhVTL - the udev team would have fixed any discrepancies like this a long time ago.

I'm looking at it now and suspect it is a race condition where the logical unit is being created before the vtltape daemon is ready to actually process any SCSI commands - hence if the kernel issues 'inquiry' to the logical unit before it is fully setup, the responses are incomplete.

markh794 avatar Jul 01 '22 04:07 markh794

Hello Peter,

Just to confirm this is a suspect race condition at startup, are you able to run the following command to reload udev rule set if you see the above condition occur udevadm control -R udevadm trigger

markh794 avatar Jul 04 '22 02:07 markh794

Hi, Mark.

Yes, I ran the above and then saw the complete udev rules. I had these mhvtl volumes before running the above:

lrwxrwxrwx  1 root root    6 Jul 14 22:34 mhvtl_lto-6_01 -> ../st0
lrwxrwxrwx  1 root root    6 Jul 14 22:34 mhvtl_lto-6_03 -> ../st9
lrwxrwxrwx  1 root root    6 Jul 14 22:34 mhvtl_lto-6_04 -> ../st5
lrwxrwxrwx  1 root root    6 Jul 14 22:34 mhvtl_lto-6_05 -> ../st3
lrwxrwxrwx  1 root root    6 Jul 14 22:34 mhvtl_lto-6_06 -> ../st8
lrwxrwxrwx  1 root root    6 Jul 14 22:34 mhvtl_lto-6_07 -> ../st1
lrwxrwxrwx  1 root root    6 Jul 14 22:34 mhvtl_lto-6_08 -> ../st7
lrwxrwxrwx  1 root root    6 Jul 14 22:34 mhvtl_lto-6_09 -> ../st2
lrwxrwxrwx  1 root root    6 Jul 14 22:34 mhvtl_lto-6_10 -> ../st4

and then these afterward:

lrwxrwxrwx  1 root root    6 Jul 14 23:53 mhvtl_lto-6_01 -> ../st0
lrwxrwxrwx  1 root root    6 Jul 14 23:53 mhvtl_lto-6_02 -> ../st6
lrwxrwxrwx  1 root root    6 Jul 14 23:53 mhvtl_lto-6_03 -> ../st9
lrwxrwxrwx  1 root root    6 Jul 14 23:53 mhvtl_lto-6_04 -> ../st5
lrwxrwxrwx  1 root root    6 Jul 14 23:53 mhvtl_lto-6_05 -> ../st3
lrwxrwxrwx  1 root root    6 Jul 14 23:53 mhvtl_lto-6_06 -> ../st8
lrwxrwxrwx  1 root root    6 Jul 14 23:53 mhvtl_lto-6_07 -> ../st1
lrwxrwxrwx  1 root root    6 Jul 14 23:53 mhvtl_lto-6_08 -> ../st7
lrwxrwxrwx  1 root root    6 Jul 14 23:53 mhvtl_lto-6_09 -> ../st2
lrwxrwxrwx  1 root root    6 Jul 14 23:53 mhvtl_lto-6_10 -> ../st4

So after doing these commands:

udevadm control -R
udevadm trigger

the link for mhvtl_lto-6_02 to st6 appeared. I looked in /var/log/messages and found this shortly after the reboot when the udev rules should have been run:

Jul 14 22:34:25 lews kernel: st 2:0:2:0: Attached scsi tape st6
Jul 14 22:34:25 lews kernel: st 2:0:2:0: st6: try direct i/o: yes (alignment 4 B)
Jul 14 22:34:25 lews kernel: st 2:0:2:0: Attached scsi generic sg19 type 1
Jul 14 22:34:25 lews /usr/bin/vtltape[2342]: add_lu(): Wrote 'add 12 0 2 0#012' (13 bytes) to /sys/bus/mhvtl/drivers/mhvtl/add_lu
Jul 14 22:34:25 lews /usr/bin/vtltape[2342]: add_lu(): Child anounces 'lu [0:2:0] created'.
Jul 14 22:34:25 lews /usr/bin/vtltape[2240]: main(): Cleaning up after add_lu child pid: 2342

I have been noticing this issue for several years but just got around to reporting it with this issue.

Thank you, Peter

rohr22 avatar Jul 15 '22 05:07 rohr22

Hi Peter, I must admit to having seen this issue from time-to-time (but not a user of custom udev rules as such - hasn't bothered me).

However, a whole heap of theories have been thrown out over the last couple of weeks and I have finally identified the cause. (systemd / udev in debug mode without any rate limiting)

Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: 'scsi_id --whitelisted --export --device=/dev/bsg/5:0:2:0'(err) 'scsi_id: cannot open /dev/bsg/5:0:2:0: No such file or directory' Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: 'scsi_id --whitelisted --export --device=/dev/bsg/5:0:2:0'(out) 'ID_SCSI=1' Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: 'scsi_id --whitelisted --export --device=/dev/bsg/5:0:2:0'(out) 'ID_VENDOR=' Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: 'scsi_id --whitelisted --export --device=/dev/bsg/5:0:2:0'(out) 'ID_VENDOR_ENC=' Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: 'scsi_id --whitelisted --export --device=/dev/bsg/5:0:2:0'(out) 'ID_MODEL=' Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: 'scsi_id --whitelisted --export --device=/dev/bsg/5:0:2:0'(out) 'ID_MODEL_ENC=' Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: 'scsi_id --whitelisted --export --device=/dev/bsg/5:0:2:0'(out) 'ID_REVISION=' Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: 'scsi_id --whitelisted --export --device=/dev/bsg/5:0:2:0'(out) 'ID_TYPE=' Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: Process 'scsi_id --whitelisted --export --device=/dev/bsg/5:0:2:0' succeeded. Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: IMPORT builtin 'path_id' /usr/lib/udev/rules.d/60-persistent-storage-tape.rules:25 Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: IMPORT builtin 'path_id' returned non-zero Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: handling device node '/dev/nst7', devnum=c9:135, mode=0660, uid=0, gid=33 Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: set permissions /dev/nst7, 020660, uid=0, gid=33 Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: creating symlink '/dev/char/9:135' to '../nst7' Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: seq 2848 processed Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19428]: seq 2855 running Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19428]: IMPORT builtin 'hwdb' /usr/lib/udev/rules.d/50-udev-default.rules:14 Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19428]: IMPORT builtin 'hwdb' returned non-zero Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19428]: seq 2855 processed Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19428]: seq 2856 running Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19428]: seq 2856 processed Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: seq 2858 running Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: handling device node '/dev/bsg/5:0:2:0', devnum=c247:16, mode=0600, uid=0, gid=0 Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: preserve permissions /dev/bsg/5:0:2:0, 020600, uid=0, gid=0 Jul 21 11:26:21 mhvtldev-localdomain systemd-udevd[19429]: creating symlink '/dev/char/247:16' to '../bsg/5:0:2:0'

i.e. the udev SCSI tape rule is attempting to access the passthrough path before udev has created the device.

Now to figure out a way to encourage udev to play nice and setup /dev/bsg/ paths before the persistent tape rules :)

Ideas and suggestions most welcome.

markh794 avatar Jul 21 '22 01:07 markh794

FWIW: 'udevadm trigger' once the /dev/bsg/* devices are created will re-create any missing paths by your udev rules.

markh794 avatar Jul 21 '22 02:07 markh794

Yeah, we use udevadm trigger to recreate the udev paths and I think that nearly always puts back the missing paths.. However, I think some time later additional paths might be missing.

rohr22 avatar Jul 21 '22 20:07 rohr22

Looking into udev/systemd logs show 'scsi_id' being unable to open /dev/bsg/xxx ( hence unable to retrieve scsi parameters) because the persistent tape rules are run before the rules to create /dev/bsg/* and /dev/sg* get executed

markh794 avatar Jul 22 '22 12:07 markh794

FWIW: I recently 'yum update -y' and the udev issues I was seeing has gone away.. I do see that 'systemd-udev' package was updated (I didn't capture the version before hand). systemd-udev-239-60.el8.x86_64

markh794 avatar Jul 27 '22 01:07 markh794

To debug udev rules: Update /etc/udev/udev.conf udev_log="debug" Don't start mhvtl on boot: systemctl disable mhvtl.target Restart the system: In one terminal - monitor systemd via journalctl -f In another terminal - Start mhVTL systemctl start mhvtl.target Flip back to the 'journalctl -f' session to view the behaviour of udev. If you are still encountering this issue, please capture all the 'journalctl -f' output and make it available. Cheers Mark

markh794 avatar Jul 27 '22 01:07 markh794

Hi, Mark.

I updated four of our test systems to have udev_log="debug" in the /etc/udev/udev.conf file. However, I don't understand why I have to disable mhvtl.target. The next time I notice that we don't have all the entries expected from the udev rules, cannot I look in the /var/log/messages file to see what might have gone wrong and share any relevant output with you at that time?

Thank you, Peter

rohr22 avatar Jul 28 '22 22:07 rohr22

Even though udev is in debug, but it does not appear to make it to /var/log/messages The journalctl -f seems to be the only way to capture the info (not that I've gone looking too hard). The disable start of mhvtl.target is just so a terminal journalctl -f can be run before the start of mhvtl.target

markh794 avatar Jul 29 '22 03:07 markh794

BTW: What distro and version are you running ? I get no problems with 'openSUSE Leap' I'm just re-checking CentOS7 now...

markh794 avatar Jul 29 '22 03:07 markh794

We are using RHEL 8 but this problem also happens on RHEL 7 machines.

rohr22 avatar Jul 29 '22 04:07 rohr22

Even though udev is in debug, but it does not appear to make it to /var/log/messages The journalctl -f seems to be the only way to capture the info (not that I've gone looking too hard). The disable start of mhvtl.target is just so a terminal journalctl -f can be run before the start of mhvtl.target

Oh, I did not know that it would not make it to /var/log/messages. I was hoping to let mhvtl.target run automatically and then when the problem occurs take a look at the journalctl command to see what happened. Won't that work also?

rohr22 avatar Jul 29 '22 04:07 rohr22

Most likely I try it out a little later

markh794 avatar Jul 29 '22 04:07 markh794