ltfs
ltfs copied to clipboard
ltfs loops with 'LTFS30216W Length mismatch is detected.'
Describe the bug OS: Ubuntu 24.04. latest master release
MOST volumes in a 2000 tape changer mount, read / write just fine. Occasionally a volume will be loaded which exibits the following issue:
root@max-sfa-01:/usr/share/SFS/bin/NxCore# ltfs /ltfs/Tape4 -o devname=/dev/sg23
262dcf LTFS14000I LTFS starting, LTFS version 2.4.4.1 (Prelim), log level 2.
262dcf LTFS14058I LTFS Format Specification version 2.4.0.
262dcf LTFS14104I Launched by "ltfs /ltfs/Tape4 -o devname=/dev/sg23".
262dcf LTFS14105I This binary is built for Linux (x86_64).
262dcf LTFS14106I GCC version is 13.3.0.
262dcf LTFS17087I Kernel version: Linux version 6.8.0-58-generic (buildd@lcy02-amd64-040) (x86_64-linux-gnu-gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, GNU ld (GNU Binutils for Ubuntu) 2.42) #60-Ubuntu SMP PREEMPT_DYNAMIC Fri Mar 14 18:29:48 UTC 2025 i386. 262dcf LTFS17089I Distribution: PRETTY_NAME="Ubuntu 24.04.1 LTS".
262dcf LTFS17089I Distribution: DISTRIB_ID=Ubuntu.
262dcf LTFS14063I Sync type is "time", Sync time is 300 sec.
262dcf LTFS17085I Plugin: Loading "sg" tape backend.
262dcf LTFS17085I Plugin: Loading "unified" iosched backend.
262dcf LTFS14095I Set the tape device write-anywhere mode to avoid cartridge ejection.
262dcf LTFS30209I Opening a device through sg-ibmtape driver (/dev/sg23).
262dcf LTFS30250I Opened the SCSI tape device 10.0.12.0 (/dev/sg23).
262dcf LTFS30207I Vendor ID is IBM .
262dcf LTFS30208I Product ID is ULTRIUM-TD6 .
262dcf LTFS30214I Firmware revision is KAJ8.
262dcf LTFS30215I Drive serial is 1013005917.
262dcf LTFS30285I The reserved buffer size of /dev/sg23 is 1048576.
262dcf LTFS30294I Setting up timeout values from RSOC.
262dcf LTFS17160I Maximum device block size is 1048576.
262dcf LTFS11330I Loading cartridge.
262dcf LTFS30252I Logical block protection is disabled.
262dcf LTFS11332I Load successful.
262dcf LTFS17157I Changing the drive setting to write-anywhere mode.
262dcf LTFS11005I Mounting the volume.
262dcf LTFS30252I Logical block protection is disabled.
262dcf LTFS30216W Length mismatch is detected. (Act = 4096, resid = 0, resid_sense = -61440).
262dcf LTFS30216W Length mismatch is detected. (Act = 4096, resid = 0, resid_sense = -61440).
262dcf LTFS30216W Length mismatch is detected. (Act = 4096, resid = 0, resid_sense = -61440).
262dcf LTFS30216W Length mismatch is detected. (Act = 4096, resid = 0, resid_sense = -61440).
...... And this loops forever ( process was run for over 12 hours, so never exits ).
To Reproduce Very difficult as this is a random volume within 2000+ volumes in a changer, I believe this only effects LTO6 media. The changer has mixed LTO media types ( LTO9, LTO7 and LTO6 ).
Expected behavior The volume should mount as it is formatted etc.
Desktop (please complete the following information):
- OS: Ubuntu 24.04
- Version have tried 2.4.4, 2.4.7, 2.5.0 ( Prelim ) all exhibit the same issue
Please try the following patch to avoid infinite loop. This patch just retries read when -EDEV_LENGTH_MISMATCH is detected and return error when it happens twice. Obviously this is not a fix of the problem but ltfs returns mount error immediately.
diff --git a/src/tape_drivers/linux/sg/sg_tape.c b/src/tape_drivers/linux/sg/sg_tape.c
index 86d93e9..859c9c6 100644
--- a/src/tape_drivers/linux/sg/sg_tape.c
+++ b/src/tape_drivers/linux/sg/sg_tape.c
@@ -1926,7 +1926,7 @@ int sg_read(void *device, char *buf, size_t size,
int32_t ret = -EDEV_UNKNOWN;
struct sg_data *priv = (struct sg_data*)device;
size_t datacount = size;
- struct tc_position pos_retry = {0, 0};
+ struct tc_position pos_retry = { .partition = 0, .block = TAPE_BLOCK_MAX};
int retry_count = 0;
ltfs_profiler_add_entry(priv->profiler, NULL, TAPEBEND_REQ_ENTER(REQ_TC_READ));
@@ -1954,7 +1954,7 @@ int sg_read(void *device, char *buf, size_t size,
start_read:
ret = _cdb_read(device, buf, datacount, unusual_size);
if (ret == -EDEV_LENGTH_MISMATCH) {
- if (pos_retry.partition || pos_retry.block) {
+ if (pos_retry.block != TAPE_BLOCK_MAX) {
/* Return error when retry is already executed */
sg_readpos(device, pos);
ltfs_profiler_add_entry(priv->profiler, NULL, TAPEBEND_REQ_EXIT(REQ_TC_READ));
Honestly, I don't have any idea to fix this if this problem intermittently happens in a drive. It means, let's say, this problem happens in drive A at a time. But it doesn't happen in the next time in the same drive. If this is correct understanding, I believe this is clearly HBA or HBA driver error, not a LTFS problem.