libiscsi icon indicating copy to clipboard operation
libiscsi copied to clipboard

iscsi_read16_sync hangs and spins cpu if hung up on

Open nixomose opened this issue 8 years ago • 2 comments

I traced through it a bit, if you're doing a read and you pull the network plug out, the event loop checks for results from poll. It correctly picks up the POLLHUP and then attempts to reconnect. Through various callings, I think it was gethostbyname, I forget exactly, some part of the reconnect fails, and the end result is that 0 is returned, all the way up the chain back to the event loop. So then poll immediately returns with POLLHUP, but now since iscsi_service checks for fd<0 and returns right away, nothing notices the POLLHUP and then it goes back to poll which returns immediately and that's what spins the cpu, and it never gets out of that loop.

nixomose avatar Feb 16 '17 17:02 nixomose

the problem is here... if (iscsi->old_iscsi) { if (!iscsi->pending_reconnect) { iscsi_reconnect_cb(iscsi, SCSI_STATUS_ERROR, NULL, NULL); } return 0; } old_iscsi exists because it was created by the reconnect attempt, it is not pending reconnect, it calls iscsi_reconnect_cb and regardless of the outcome, 0 is returned, which gets the caller to think everything is okay, and its not.

nixomose avatar Feb 16 '17 19:02 nixomose

I can reproduce the issue on aa617d1 with this patch and running tests/tests_9012_read16.sh.

diff --git a/test-tool/test_read16_simple.c b/test-tool/test_read16_simple.c
index 30a882d..0c8b47e 100644
--- a/test-tool/test_read16_simple.c
+++ b/test-tool/test_read16_simple.c
@@ -39,6 +39,11 @@ test_read16_simple(void)
                 if (maximum_transfer_length && maximum_transfer_length < i) {
                         break;
                 }
+
+                if (i == 10) {
+                    sleep(1);
+                }
+
                 READ16(sd, NULL, 0, i * block_size,
                        block_size, 0, 0, 0, 0, 0, NULL,
                        EXPECT_STATUS_GOOD);
diff --git a/tests/test_9012_read16.sh b/tests/test_9012_read16.sh
index f201c94..965d651 100755
--- a/tests/test_9012_read16.sh
+++ b/tests/test_9012_read16.sh
@@ -8,7 +8,9 @@ start_target
 create_disk_lun 1 100M
 
 echo -n "SCSI.Read16 ... "
-../test-tool/iscsi-test-cu -i ${IQNINITIATOR} iscsi://${TGTPORTAL}/${IQNTARGET}/1 -t SCSI.Read16 --dataloss > /dev/null || failure
+#../test-tool/iscsi-test-cu -i ${IQNINITIATOR} iscsi://${TGTPORTAL}/${IQNTARGET}/1 -t SCSI.Read16 --dataloss > /dev/null || failure
+../test-tool/iscsi-test-cu -i ${IQNINITIATOR} iscsi://${TGTPORTAL}/${IQNTARGET}/1 -t SCSI.Read16 --dataloss > /dev/null &
+sleep 0.5
 success
 
 shutdown_target

Watching htop, after the script runs iscsi-test-cu consumes a core, spinning on poll.

crawfxrd avatar Dec 12 '17 00:12 crawfxrd