curve icon indicating copy to clipboard operation
curve copied to clipboard

Failed to create file system when mds switch

Open YunhuiChen opened this issue 2 years ago • 2 comments

Describe the bug (描述bug)

Failed to create file system when mds switch

To Reproduce (复现方法) 1、Create a filesystem in a loop with curvefs_tool 2、stop mds 3、[error] send create fs request to mds: 10.166.24.58:6700 failed, errorcode= 112, error text: [E111]Fail to connect Socket{id=0 addr=10.166.24.58:6700} (0x0x7f4a6ad681c0): Connection refused [R1][E112]Not connected to 10.166.24.58:6700 yet, server_id=0 [R2][E112]Not connected to 10.166.24.58:6700 yet, server_id=0 [R3][E112]Not connected to 10.166.24.58:6700 yet, server_id=0 send create fs request to mds: 10.166.24.59:6700 failed, errorcode= 112, error text: [E111]Fail to connect Socket{id=2 addr=10.166.24.59:6700} (0x0x7f4a6ad68640): Connection refused [R1][E112]Not connected to 10.166.24.59:6700 yet, server_id=2 [R2][E112]Not connected to 10.166.24.59:6700 yet, server_id=2 [R3][E112]Not connected to 10.166.24.59:6700 yet, server_id=2 send create fs request to mds: 10.166.24.35:6700 failed, errorcode= 112, error text: [E111]Fail to connect Socket{id=4 addr=10.166.24.35:6700} (0x0x7f4a6ad68ac0): Connection refused [R1][E112]Not connected to 10.166.24.35:6700 yet, server_id=4 [R2][E112]Not connected to 10.166.24.35:6700 yet, server_id=4 [R3][E112]Not connected to 10.166.24.35:6700 yet, server_id=4

Expected behavior (期望行为)

Versions (各种版本) OS: Compiler: branch: commit id:

Additional context/screenshots (更多上下文/截图)

YunhuiChen avatar Jul 01 '22 01:07 YunhuiChen

The curvefs_tool use the buildin retry framework of brpc which is simple and can not deal with most situation. You can see the docs: https://github.com/apache/incubator-brpc/blob/master/docs/cn/client.md#%E9%87%8D%E8%AF%95

In this case, it didn't trigger retry at all. So curvefs_tool rpc retry policy may need improve @Cyber-SiKu

SeanHai avatar Jul 01 '22 03:07 SeanHai

mds 端由于metaserver心跳存在以下问题: 在mds切换后,copyset资源耗尽前可以创建fs,如果copyset资源耗尽前,metaserver没有上报心跳,会认为metaserver不可用从而导致创建fs失败,待心跳上报恢复后,即可正常创建fs.

Cyber-SiKu avatar Jul 11 '22 02:07 Cyber-SiKu