Derek Su
Derek Su
I think the purposes of a long timeout and EIO/EAGAIN are different. A long timeout can accommodate long latency caused by underlying networking or storage issues. However, once the timeout...
> > I think the purposes of a long timeout and EIO/EAGAIN are different. A long timeout can accommodate long latency caused by underlying networking or storage issues. However, once...
> Returning an error is dangerous as the upper layer may consider the block device as no longer available. For example, the filesystem may become read-only. Yes. EAGAIN or EBUSY...
> Do you mean that Longhorn will return `EAGAIN` or `EBUSY` and retain the replicas when all replicas/the last replica has not triggered the timeout for MAX_RETRY times? Only for...
After discussing with @innobead this morning, we have some thoughts for the timeouts and retries in the data path - Ping: The current implementation sets a ping timeout of 2...
Best effort in v1.7.0
> If we instead return SAM_STAT_BUSY, it seems the filesystem layer should interpret it as meaning we never accepted the command for processing. However, if we are just waiting on...
Another question If the timeout between SCSI and iSCSI is 60 seconds, for a 3-replica volume, the timeout of the first two replicas is 8 seconds, does it mean we...
> > Another question If the timeout between SCSI and iSCSI is 60 seconds, for a 3-replica volume, the timeout of the first two replicas is 8 seconds, does it...
Your iSCSI service seems problematic and leads to IO errors in Longhorn volumes. Can you check logs of the dmesg and iSCSI service ?