feat(core/bench): add fs, monoiofs and compfs benchmark
Part of #4552.
This PR introduces a benchmark that compares the performance of OpenDAL services fs, monoiofs (and compfs, but unfortunately commented out since it is still a work in progress and did not finish the benchmark).
Concurrent benchmarks uses .chunk(size).concurrent(parallel) rather than polling several independent io tasks (which is the way bench/ops benchmarks). Not sure which one simulates real world better.
Full benchmark result is as follows. I'm going to write a progress report of monoiofs along with a brief analysis of the result on the mailing list, so stay tuned. :yum:
Benchmark result
read 4.00 KiB/fs time: [27.531 µs 27.614 µs 27.710 µs]
thrpt: [140.97 MiB/s 141.46 MiB/s 141.89 MiB/s]
read 4.00 KiB/monoiofs time: [30.880 µs 32.507 µs 34.392 µs]
thrpt: [113.58 MiB/s 120.17 MiB/s 126.50 MiB/s]
Found 20 outliers among 100 measurements (20.00%)
4 (4.00%) low severe
1 (1.00%) low mild
6 (6.00%) high mild
9 (9.00%) high severe
read 256 KiB/fs time: [64.932 µs 69.819 µs 73.971 µs]
thrpt: [3.3005 GiB/s 3.4968 GiB/s 3.7599 GiB/s]
read 256 KiB/monoiofs time: [49.171 µs 50.267 µs 51.240 µs]
thrpt: [4.7646 GiB/s 4.8568 GiB/s 4.9651 GiB/s]
read 4.00 MiB/fs time: [985.53 µs 986.56 µs 987.66 µs]
thrpt: [3.9551 GiB/s 3.9595 GiB/s 3.9636 GiB/s]
Found 11 outliers among 100 measurements (11.00%)
2 (2.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
4 (4.00%) high severe
read 4.00 MiB/monoiofs time: [230.23 µs 232.47 µs 234.54 µs]
thrpt: [16.655 GiB/s 16.803 GiB/s 16.967 GiB/s]
Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) low severe
Benchmarking read 16.0 MiB/fs: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 10.0s. You may wish to increase target time to 12.5s, enable flat sampling, or reduce sample count to 60.
read 16.0 MiB/fs time: [2.4754 ms 2.4769 ms 2.4785 ms]
thrpt: [6.3043 GiB/s 6.3084 GiB/s 6.3120 GiB/s]
Found 7 outliers among 100 measurements (7.00%)
5 (5.00%) high mild
2 (2.00%) high severe
read 16.0 MiB/monoiofs time: [1.7233 ms 1.7337 ms 1.7459 ms]
thrpt: [8.9495 GiB/s 9.0123 GiB/s 9.0668 GiB/s]
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) high mild
5 (5.00%) high severe
read concurrent 16x4.00 KiB/fs
time: [298.61 µs 306.31 µs 315.15 µs]
thrpt: [198.32 MiB/s 204.04 MiB/s 209.31 MiB/s]
read concurrent 16x4.00 KiB/monoiofs
time: [234.62 µs 244.44 µs 253.48 µs]
thrpt: [246.57 MiB/s 255.69 MiB/s 266.39 MiB/s]
read concurrent 16x256 KiB/fs
time: [399.76 µs 404.79 µs 409.38 µs]
thrpt: [9.5420 GiB/s 9.6502 GiB/s 9.7715 GiB/s]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
read concurrent 16x256 KiB/monoiofs
time: [327.93 µs 339.69 µs 351.85 µs]
thrpt: [11.102 GiB/s 11.500 GiB/s 11.912 GiB/s]
read concurrent 16x4.00 MiB/fs
time: [6.9919 ms 7.0599 ms 7.1296 ms]
thrpt: [8.7663 GiB/s 8.8528 GiB/s 8.9389 GiB/s]
read concurrent 16x4.00 MiB/monoiofs
time: [8.7125 ms 8.7150 ms 8.7180 ms]
thrpt: [7.1691 GiB/s 7.1715 GiB/s 7.1736 GiB/s]
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
read concurrent 16x16.0 MiB/fs
time: [24.318 ms 24.443 ms 24.569 ms]
thrpt: [10.175 GiB/s 10.228 GiB/s 10.280 GiB/s]
read concurrent 16x16.0 MiB/monoiofs
time: [33.131 ms 33.149 ms 33.168 ms]
thrpt: [7.5375 GiB/s 7.5417 GiB/s 7.5458 GiB/s]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
write 4.00 KiB/fs time: [32.943 µs 33.006 µs 33.073 µs]
thrpt: [118.11 MiB/s 118.35 MiB/s 118.58 MiB/s]
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) high mild
13 (13.00%) high severe
write 4.00 KiB/monoiofs time: [62.287 µs 66.421 µs 70.833 µs]
thrpt: [55.148 MiB/s 58.810 MiB/s 62.713 MiB/s]
Found 14 outliers among 100 measurements (14.00%)
12 (12.00%) low severe
2 (2.00%) high severe
write 256 KiB/fs time: [122.46 µs 129.67 µs 137.91 µs]
thrpt: [1.7702 GiB/s 1.8828 GiB/s 1.9936 GiB/s]
Found 21 outliers among 100 measurements (21.00%)
20 (20.00%) low severe
1 (1.00%) high severe
write 256 KiB/monoiofs time: [125.98 µs 126.22 µs 126.46 µs]
thrpt: [1.9306 GiB/s 1.9343 GiB/s 1.9380 GiB/s]
write 4.00 MiB/fs time: [1.8901 ms 1.9275 ms 1.9553 ms]
thrpt: [1.9978 GiB/s 2.0265 GiB/s 2.0667 GiB/s]
write 4.00 MiB/monoiofs time: [1.1394 ms 1.1405 ms 1.1416 ms]
thrpt: [3.4218 GiB/s 3.4251 GiB/s 3.4283 GiB/s]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe
write 16.0 MiB/fs time: [5.0289 ms 5.0352 ms 5.0416 ms]
thrpt: [3.0992 GiB/s 3.1031 GiB/s 3.1071 GiB/s]
write 16.0 MiB/monoiofs time: [5.3379 ms 5.3411 ms 5.3444 ms]
thrpt: [2.9236 GiB/s 2.9254 GiB/s 2.9272 GiB/s]
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
write concurrent 16x4.00 KiB/fs
time: [150.81 µs 150.93 µs 151.05 µs]
thrpt: [413.78 MiB/s 414.09 MiB/s 414.43 MiB/s]
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low severe
2 (2.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
write concurrent 16x4.00 KiB/monoiofs
time: [324.57 µs 335.41 µs 346.54 µs]
thrpt: [180.36 MiB/s 186.34 MiB/s 192.56 MiB/s]
Found 12 outliers among 100 measurements (12.00%)
10 (10.00%) low severe
1 (1.00%) high mild
1 (1.00%) high severe
write concurrent 16x256 KiB/fs
time: [1.6521 ms 1.6540 ms 1.6558 ms]
thrpt: [2.3591 GiB/s 2.3617 GiB/s 2.3644 GiB/s]
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low mild
2 (2.00%) high severe
write concurrent 16x256 KiB/monoiofs
time: [1.3098 ms 1.3107 ms 1.3118 ms]
thrpt: [2.9779 GiB/s 2.9802 GiB/s 2.9824 GiB/s]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
write concurrent 16x4.00 MiB/fs
time: [26.624 ms 26.687 ms 26.750 ms]
thrpt: [2.3364 GiB/s 2.3420 GiB/s 2.3475 GiB/s]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
write concurrent 16x4.00 MiB/monoiofs
time: [23.338 ms 23.354 ms 23.369 ms]
thrpt: [2.6745 GiB/s 2.6762 GiB/s 2.6780 GiB/s]
Found 6 outliers among 100 measurements (6.00%)
6 (6.00%) low mild
write concurrent 16x16.0 MiB/fs
time: [95.318 ms 95.462 ms 95.609 ms]
thrpt: [2.6148 GiB/s 2.6188 GiB/s 2.6228 GiB/s]
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
write concurrent 16x16.0 MiB/monoiofs
time: [94.194 ms 94.233 ms 94.275 ms]
thrpt: [2.6518 GiB/s 2.6530 GiB/s 2.6541 GiB/s]
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe
Thank you very much for your work! I have noticed that Monoio performs well on single reads but not as effectively on concurrent ones. Could this be related to our thread-per-core design? Are there any plans to improve it?
Thank you very much for your work! I have noticed that Monoio performs well on single reads but not as effectively on concurrent ones. Could this be related to our thread-per-core design? Are there any plans to improve it?
Monoiofs is currently single threaded (let worker_threads = 1;) wile fs runs on multiple threads. There seems to be some deadlock bug with multiple worker thread that I'm investigating. The performance should be improved after fixing the bug, enabling worker thread pool and binding them to cpu cores.
Also, PositionWrite is not implemented for monoiofs.
This PR is unlikely to make progress in the near future, so let's close it for now. You're welcome to open a new one in the future.