Leonid S. Usov

Results 181 comments of Leonid S. Usov

Further thinking about it, I conclude that @batrick 's original approach was the correct one with the current quiesce protocol design. I tried to capture it in the comment in...

The latest run: https://pulpito.ceph.com/leonidus-2024-05-18_16:39:02-fs-wip-lusov-quiesce-distro-default-smithi/ No quiesce timeouts, only EMEDIUMTYPE: ``` $ grep "ERROR:tasks.quiescer" */teuthology.log 7712672/teuthology.log:2024-05-18T17:28:13.461 ERROR:tasks.quiescer.fs.[cephfs]:Couldn't parse response with error Expecting value: line 1 column 1 (char 0); rc: 124...

OK, the issue here is that I'm calling the `dispatch_fragment_dir` synchronously while being called back as part of the same request cleanup: ``` -239> 2024-05-18T17:26:32.305+0000 7f31d3332700 15 mds.0.cache request_cleanup request(mds.0:86320...

I ran 2 new job sets with the latest version. 1. https://pulpito.ceph.com/leonidus-2024-05-19_09:55:47-fs-wip-lusov-quiesce-distro-default-smithi/ ``` [leonidus@vossi04 leonidus-2024-05-19_09:55:47-fs-wip-lusov-quiesce-distro-default-smithi]$ grep "ERROR:tasks.quiescer" */teuthology.log 7713401/teuthology.log:2024-05-19T10:26:54.953 ERROR:tasks.quiescer.fs.[cephfs]:exception: 7713402/teuthology.log:2024-05-19T10:38:05.029 ERROR:tasks.quiescer.fs.[cephfs]:Couldn't release set '3d4f6a9f' with rc: 1 (EPERM), stdout:...

2. https://pulpito.ceph.com/leonidus-2024-05-19_10:35:10-fs-wip-lusov-quiesce-distro-default-smithi/ ``` [leonidus@vossi04 leonidus-2024-05-19_10:35:10-fs-wip-lusov-quiesce-distro-default-smithi]$ grep "ERROR:tasks.quiescer" */teuthology.log 7713463/teuthology.log:2024-05-19T11:15:37.944 ERROR:tasks.quiescer.fs.[cephfs]:Couldn't quiesce root with rc: 110 (ETIMEDOUT), stdout: 7713463/teuthology.log:2024-05-19T11:15:37.944 ERROR:tasks.quiescer.fs.[cephfs]:exception: ``` No outstanding quiesce ops ``` $ grep "Outstanding" 7713463/teuthology.log 2024-05-19T11:14:54.323...

Out of the two runs above, there's only one new real quiesce timeout, and it's some interlock with export https://pulpito.ceph.com/leonidus-2024-05-19_09:55:47-fs-wip-lusov-quiesce-distro-default-smithi/7713434/ All of the pending quiesce_inode ops are failing to authpin:...

The arm64 failure is addressed by https://github.com/ceph/ceph/pull/57552