glusterfs icon indicating copy to clipboard operation
glusterfs copied to clipboard

index: Optimize index xlator code to improve file creation

Open mohit84 opened this issue 2 years ago • 6 comments

The afr/ec use xattrop internal fop to maintain consistency for data transactions. As index xlator got a xattrop wind request it takes action to add/delete gfid path under indices directory specific to gfid. In case of indices add index xlator call sys_link and in case of delete it call sys_rename and sys_unlink based on the index type.After reading statedump i have observed index xlator is having significant latency for xattrop in case of create(with data) fop.

Solution: The index xlator uses directory xattrop,entry-changes,dirty under indices(.glusterfs/indices) to link/unlink gfids so keep fd open for the index parent directories and use at based system call(linkat,renameat,fstatat,unlinkat,openat).

mohit84 avatar Jun 21 '22 13:06 mohit84

As we can see in below statedump at index xlator level avg latency is significantly high as compare to posix xlator level for FXATTROP.

test1-server.latency.FXATTROP=AVG:1428735.799834 CNT:160002 TOTAL:228600585445 MIN:51762 MAX:1047270802 GlusterFS 4.x v1.latency.FXATTROP=AVG:1436028.395024 CNT:159993 TOTAL:229754491005 MIN:56419 MAX:1047281163 test1-io-stats.latency.FXATTROP=AVG:1423384.600592 CNT:160002 TOTAL:227744382864 MIN:48214 MAX:1047266904 test1-index.latency.FXATTROP=AVG:1421981.479707 CNT:160002 TOTAL:227519880716 MIN:47221 MAX:1047265295 test1-io-threads.latency.FXATTROP=AVG:34805.602880 CNT:160002 TOTAL:5568966072 MIN:17143 MAX:133640095 test1-upcall.latency.FXATTROP=AVG:23491.527643 CNT:160002 TOTAL:3758691406 MIN:12977 MAX:3090275 test1-locks.latency.FXATTROP=AVG:22481.180641 CNT:160002 TOTAL:3597033865 MIN:12554 MAX:644153 test1-changelog.latency.FXATTROP=AVG:19924.967688 CNT:160002 TOTAL:3188034680 MIN:11089 MAX:641805 test1-posix.latency.FXATTROP=AVG:18525.418420 CNT:160002 TOTAL:2964103998 MIN:10491 MAX:640402

After applied the patch below is the latency we are seeing significant reduction in the latency that eventually improve the performance.

test1-server.latency.FXATTROP=AVG:53384.242509 CNT:160002 TOTAL:8541585570 MIN:27837 MAX:5715557 GlusterFS 4.x v1.latency.FXATTROP=AVG:60202.303306 CNT:159997 TOTAL:9632187922 MIN:31917 MAX:5724649 test1-index.latency.FXATTROP=AVG:48083.318140 CNT:160002 TOTAL:7693427069 MIN:23682 MAX:5711187 test1-io-threads.latency.FXATTROP=AVG:31372.965019 CNT:160002 TOTAL:5019737149 MIN:17847 MAX:5699012 test1-upcall.latency.FXATTROP=AVG:22114.942188 CNT:160002 TOTAL:3538434980 MIN:12614 MAX:5678772 test1-locks.latency.FXATTROP=AVG:21309.776753 CNT:160002 TOTAL:3409606900 MIN:12197 MAX:5677554 test1-changelog.latency.FXATTROP=AVG:18889.344833 CNT:160002 TOTAL:3022332952 MIN:10856 MAX:5675312 test1-posix.latency.FXATTROP=AVG:17818.108236 CNT:160002 TOTAL:2850932954 MIN:10372 MAX:5673609

To validate the patch i have executed below test case on 1x3 volume without enabling any option.

echo 3 > /proc/sys/vm/drop_caches;/root/smallfile/smallfile_cli.py --operation create --threads 8 --file-size 64 --files 10000 --top /mnt/test;echo 3 > /proc/sys/vm/drop_caches;/root/smallfile/smallfile_cli.py --operation cleanup --threads 8 --file-size 64 --files 10000 --top /mnt/test;echo 3 > /proc/sys/vm/drop_caches;/root/smallfile/smallfile_cli.py --operation create --threads 8 --file-size 64 --files 10000 --top /mnt/test;echo 3 > /proc/sys/vm/drop_caches;/root/smallfile/smallfile_cli.py --operation cleanup --threads 8 --file-size 64 --files 10000 --top /mnt/test;echo 3 > /proc/sys/vm/drop_caches;/root/smallfile/smallfile_cli.py --operation create --threads 8 --file-size 64 --files 10000 --top /mnt/test;echo 3 > /proc/sys/vm/drop_caches;/root/smallfile/smallfile_cli.py --operation cleanup --threads 8 --file-size 64 --files 10000 --top /mnt/test;echo 3 > /proc/sys/vm/drop_caches;/root/smallfile/smallfile_cli.py --operation create --threads 8 --file-size 64 --files 10000 --top /mnt/test;echo 3 > /proc/sys/vm/drop_caches;/root/smallfile/smallfile_cli.py --operation cleanup --threads 8 --file-size 64 --files 10000 --top /mnt/test

With Patch

1st attempt

total threads = 8 total files = 80000 total data = 4.883 GB 100.00% of requested files processed, minimum is 90.00 43.880256 sec elapsed time 1823.143423 files/sec 1823.143423 IOPS 113.946464 MB/sec

2nd attempt

total threads = 8 total files = 75300 total data = 4.596 GB 94.12% of requested files processed, minimum is 90.00 65.529770 sec elapsed time 1149.096054 files/sec 1149.096054 IOPS 71.818503 MB/sec

3rd attempt

total threads = 8 total files = 73000 total data = 4.456 GB 91.25% of requested files processed, minimum is 90.00 73.376040 sec elapsed time 994.875167 files/sec 994.875167 IOPS 62.179698 MB/sec

4th attempt

total threads = 8 total files = 76000 total data = 4.639 GB 95.00% of requested files processed, minimum is 90.00 74.406381 sec elapsed time 1021.417772 files/sec 1021.417772 IOPS 63.838611 MB/sec

Without Patch

1st attempt

total threads = 8 total files = 78500 total data = 4.791 GB 98.12% of requested files processed, minimum is 90.00 191.499534 sec elapsed time 409.922668 files/sec 409.922668 IOPS 25.620167 MB/sec

2nd attempt total threads = 8 total files = 77300 total data = 4.718 GB 96.62% of requested files processed, minimum is 90.00 175.865830 sec elapsed time 439.539620 files/sec 439.539620 IOPS 27.471226 MB/sec

3rd attempt total threads = 8 total files = 78600 total data = 4.797 GB 98.25% of requested files processed, minimum is 90.00 177.072729 sec elapsed time 443.885405 files/sec 443.885405 IOPS 27.742838 MB/sec

4th attempt

total threads = 8 total files = 75700 total data = 4.620 GB 94.62% of requested files processed, minimum is 90.00 156.726371 sec elapsed time 483.007419 files/sec 483.007419 IOPS 30.187964 MB/sec

As we can see a patch is showing significant improvement for create fop.

mohit84 avatar Jun 21 '22 13:06 mohit84

Really nice improvement !!!

The apparent throughput reduction after applying the patch is real or is it just coincidence (it starts with 114 MB/s and ends at 64 MB/s) ? I would expect that all tests should work at around 100 MB/s after the first one, at least.

xhernandez avatar Jun 22 '22 05:06 xhernandez

The number is real there is no coincidence, I tried multiple times i am getting always same number (minimum 62MB/sec and max is 114 mb/sec). I believe it is happening because of maximum latency facing a posix layer. If we do compare latency of 1st and last attempt it is clearly showing MKDIR and CREATE is showing huge difference between 1st and last attempt.I think unless latency will not improve at posix layer we will not able to achieve 100 MB/s for all the tests.

1st attempt test1-posix.latency.MKDIR=AVG:1242847.485052 CNT:1639 TOTAL:2037027028 MIN:12933 MAX:134156448 test1-posix.latency.CREATE=AVG:162552.781793 CNT:80002 TOTAL:13004547649 MIN:99486 MAX:12267699

Last attempt test1-posix.latency.MKDIR=AVG:35279700.295609 CNT:1617 TOTAL:57047275378 MIN:116721 MAX:405093908 test1-posix.latency.CREATE=AVG:3076230.366153 CNT:80002 TOTAL:246104581753 MIN:95453 MAX:910196860

I will try to optimize things at posix layer in separate patch.

mohit84 avatar Jun 23 '22 05:06 mohit84

So the difference is only that the volume is not new ?

Have you checked which mkdir/create are taking longer ?

It would also be interesting to know if the difference comes from filesystem calls to XFS or it's something internal to posix xlator.

xhernandez avatar Jun 23 '22 07:06 xhernandez

So the difference is only that the volume is not new ?

Have you checked which mkdir/create are taking longer ?

It would also be interesting to know if the difference comes from filesystem calls to XFS or it's something internal to posix xlator.

Yes the difference is only that volume is not new, IIUC in case of rotational disk it always happen that's why we do ignore 1st attempt result while we do generate report for perf data. In case of NVME it does not happen.

mohit84 avatar Jun 23 '22 08:06 mohit84

Below is the data generated on daily build server, As we can see create and append are showing almost 10% improvement. We can't compare this data from previous generated data because this data is generated on NVMe and the data is generated by multiple clients, the previous data was generated on 1x3 rotational disk with single client.

FOPS DAILY-BUILD PATCH
create 15591 17277 (10 %)
ls-l 230623 235367 (2%)
chmod 24404 24340 (0%)
stat 35167 35199 (0%)
read 29141 29025 (0%)
append 13916 15205 (9%)
rename 976 985 (0%)
delete-renamed 22461 22603 (0%)
mkdir 3244 3241 (0%)
rmdir 2700 2689 (0%)
cleanup 9698 9482 (-2%)

mohit84 avatar Jun 23 '22 08:06 mohit84

Thank you for your contributions. Noticed that this issue is not having any activity in last ~6 months! We are marking this issue as stale because it has not had recent activity. It will be closed in 2 weeks if no one responds with a comment here.

stale[bot] avatar Jan 21 '23 13:01 stale[bot]

Closing this issue as there was no update since my last update on issue. If this is an issue which is still valid, feel free to open it.

stale[bot] avatar Jun 12 '23 06:06 stale[bot]