vitastor icon indicating copy to clipboard operation
vitastor copied to clipboard

[vitastor-osd] Slow op from client

Open lnsyyj opened this issue 3 years ago • 5 comments

Hi @vitalif ,

When I use the fio test book sequence to write 64k in the qemu virtual machine, numjobs=16, iodepth=4, Slow op from client... appears, then fio writes hang...

This is an occasional problem, and it does not happen 100%.

Oct 21 14:11:16 vita-3 vitastor-osd[1535]: [OSD 3] avg latency for subop 15 (ping): 117 us
Oct 21 14:11:19 vita-3 vitastor-osd[1535]: [OSD 3] avg latency for subop 15 (ping): 113 us
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 8: primary_write id=18910 inode=1000000000013 offset=c08f55000 len=6000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434829 1000000000017:4160000 v986 offset=10000 len=10000 state=0 wait=3 (detail=5152768)
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434831 1000000000017:4480000 v1012 offset=10000 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434833 1000000000017:4360000 v1007 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434835 1000000000017:3e40000 v1010 offset=10000 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434837 1000000000017:4100000 v979 offset=10000 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434839 1000000000017:4080000 v1019 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434841 1000000000017:3e60000 v1055 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434843 1000000000017:3d60000 v1050 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434845 1000000000017:3f00000 v1026 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434847 1000000000017:4400000 v1025 offset=10000 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434849 1000000000017:4300000 v1040 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434851 1000000000017:4240000 v1104 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434853 1000000000017:3f60000 v1062 offset=10000 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434855 1000000000017:3e00000 v1031 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434857 1000000000017:3e80000 v1044 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434859 1000000000017:3d00000 v1029 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434861 1000000000017:4000000 v1060 offset=10000 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434863 1000000000017:4260000 v1054 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434865 1000000000017:4440000 v993 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 9: write_stable id=41434872 1000000000013:140b000000 v342 offset=0 len=1000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 13: write_stable id=26818522 1000000000017:3ce0000 v1044 offset=10000 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 13: write_stable id=26818524 1000000000017:40a0000 v1052 offset=10000 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 13: write_stable id=26818526 1000000000017:4020000 v1076 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 13: write_stable id=26818528 1000000000017:4420000 v1003 offset=10000 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 13: write_stable id=26818530 1000000000017:45c0000 v983 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 13: write_stable id=26818532 1000000000017:3e20000 v993 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 13: write_stable id=26818534 1000000000017:44e0000 v941 offset=0 len=10000 state=0
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873223 inode=1000000000017 offset=42d0000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873225 inode=1000000000017 offset=3ed0000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873226 inode=1000000000017 offset=3ff0000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873242 inode=1000000000017 offset=3d50000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873245 inode=1000000000017 offset=3cd0000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873254 inode=1000000000017 offset=3df0000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873255 inode=1000000000017 offset=4590000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873267 inode=1000000000017 offset=44d0000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873272 inode=1000000000017 offset=4190000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873276 inode=1000000000017 offset=4130000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873279 inode=1000000000017 offset=3f80000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873285 inode=1000000000017 offset=45b0000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873293 inode=1000000000017 offset=3f20000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873294 inode=1000000000017 offset=4320000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873295 inode=1000000000017 offset=41a0000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873296 inode=1000000000017 offset=4380000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873297 inode=1000000000017 offset=4390000 len=10000 state=1
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873300 inode=1000000000017 offset=3d80000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873302 inode=1000000000017 offset=4140000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873304 inode=1000000000017 offset=41b0000 len=10000 state=1
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873305 inode=1000000000017 offset=3f90000 len=10000 state=1
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873307 inode=1000000000017 offset=3fa0000 len=10000 state=4
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: [OSD 3] Slow op from client 14: primary_write id=115873310 inode=1000000000017 offset=4150000 len=10000 state=1
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: Journal: used_start=004ea000 next_free=004ca000 dirty_start=004b9000 trim_to=004ea000 trim_to_refs=1
Oct 21 14:11:22 vita-3 vitastor-osd[1535]: Flusher: queued=2 first=dirty,1000000000017:3bc0000 trim_wanted=1 dequeuing=0 trimming=0 cur=1 target=1 active=0 syncing=0
Oct 21 14:11:25 vita-3 vitastor-osd[1535]: [OSD 3] avg latency for subop 15 (ping): 105 us

lnsyyj avatar Oct 21 '21 14:10 lnsyyj

Can you uncomment #define BLOCKSTORE_DEBUG in blockstore_impl.h, rebuild and reproduce the bug, and attach debug logs?

vitalif avatar Oct 30 '21 10:10 vitalif

One question - is your test setup configured without immediate_commit? If so then writes without fsync may hang. I was thinking it's ok because all clients do fsync sometimes. But now I'm thinking that maybe I was wrong and it should be fixed :-)

vitalif avatar Oct 30 '21 10:10 vitalif

OK, I fixed that problem with immediate_commit=none in master by introducing an "automatic sync" interval :-)

vitalif avatar Oct 30 '21 11:10 vitalif

Yes, there is --immediate_commit all in the OSD systemd service file.

Has the problem been solved?

lnsyyj avatar Nov 04 '21 09:11 lnsyyj

I don't think so because my fix was related to setups without immediate_commit all. Can you reproduce the bug with BLOCKSTORE_DEBUG?

vitalif avatar Dec 01 '21 09:12 vitalif