ClickHouse icon indicating copy to clipboard operation
ClickHouse copied to clipboard

Implement parallel and splittable bzip2 read buffer and apply it to file engine

Open taiyang-li opened this issue 1 year ago • 14 comments

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

In hadoop, single bzip2 compressed file is splitted into serveral splits, and each task processes one split, which means Hadoop MR could process single bzip2 file in parallel, especially when the file is large. Refer to: https://issues.apache.org/jira/browse/HADOOP-4012

This pr will do what hadoop had did.

parallel decompress & non-parallel parsing

./clickhouse local --allow_parallel_decompress=1 --max_download_buffer_size=2097152 --input_format_parallel_parsing=1               
SELECT *
FROM file('/data1/liyang/root/2.bz2', 'JSONEachRow')
FORMAT `Null`

Query id: 4d8444b4-6428-4140-b60c-9660f828c44a

Ok.

0 rows in set. Elapsed: 2.172 sec. Processed 130.82 thousand rows, 6.24 MB (60.22 thousand rows/s., 2.87 MB/s.)
Peak memory usage: 220.40 MiB.

parallel decompress & parallel parsing

SELECT *
FROM file('/data1/liyang/root/2.bz2', 'JSONEachRow')
FORMAT `Null`

Query id: 979a4c5e-718a-446f-b37f-bbb885221baf

Ok.

0 rows in set. Elapsed: 2.377 sec. Processed 166.81 thousand rows, 78.10 MB (70.19 thousand rows/s., 32.86 MB/s.)
Peak memory usage: 237.50 MiB.

non-parallel decompress & parallel parsing

SELECT *
FROM file('/data1/liyang/root/2.bz2', 'JSONEachRow')
FORMAT `Null`

Query id: cfe79912-9e81-45fe-b67f-f2c74447cea4

Ok.

0 rows in set. Elapsed: 5.782 sec. Processed 181.98 thousand rows, 14.08 MB (31.47 thousand rows/s., 2.44 MB/s.)
Peak memory usage: 139.68 MiB.

taiyang-li avatar Jan 12 '24 10:01 taiyang-li

This is an automated comment for commit 021961b9f1dca6d4d8b3d1cc53d92f8d058486e3 with description of existing statuses. It's updated for the latest CI running

❌ Click here to open a full report in a separate page

Check nameDescriptionStatus
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR⏳ pending
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process❌ failure
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integrational tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc❌ failure
Mergeable CheckChecks if all other necessary checks are successful❌ failure
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc❌ failure
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts❌ failure
Successful checks
Check nameDescriptionStatus
A SyncThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
ClickBenchRuns [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests✅ success
PR CheckThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success

robot-ch-test-poll3 avatar Jan 12 '24 11:01 robot-ch-test-poll3

Could you also please take this PR to finish? https://github.com/ClickHouse/ClickHouse/pull/36933

alexey-milovidov avatar Jan 12 '24 21:01 alexey-milovidov

Could you also please take this PR to finish? #36933

I'd like to

taiyang-li avatar Jan 14 '24 04:01 taiyang-li

Some tests based on src/IO/examples/read_buffer_splittable_bzip2.cpp and command line tool bunzip2. Notice that:

  • decompressFromSplits first splits the whole bzip2 file into multiple splits, then decompresses each split serially using new added SplittableBzip2ReadBuffer
  • parallelDecompressFromSplits first splits the whole bzip2 file into multiple splits, then decompresses each split parallally using new added ParallelBzip2ReadBuffer
  • decompressFromFile decompresses the whole bzip2 file using original Bzip2ReadBuffer

Decompress small bzip2 file with size = 15MB

decompressFromSplits cost 7.20827 seconds 
parallelDecompressFromSplits cost 1.62089 seconds  (parallel settings: max_split_bytes = 2MB, max_working_readers = 16)
decompressFromFile cost 6.48392 seconds
$ time bunzip2 -k  2.bz2  
bunzip2 -k 2.bz2  6.30s user 0.14s system 99% cpu 6.441 total

parallelDecompressFromSplits speeds up bzip2 decompression performance by 4x in current case.

Decompress large bzip2 file with size = 1.9GB

parallelDecompressFromSplits cost 184.129 seconds (parallel settings: max_split_bytes = 10MB, max_working_readers = 4)
parallelDecompressFromSplits cost 85.2435 seconds (parallel settings: max_split_bytes = 10MB, max_working_readers = 16)
parallelDecompressFromSplits cost 63.2412 seconds (parallel settings: max_split_bytes = 10MB, max_working_readers = 32))
decompressFromFile cost 513.919 seconds
$ time bunzip2 -k  1.bz2 
bunzip2 -k 1.bz2  527.65s user 11.66s system 99% cpu 8:59.65 total

parallelDecompressFromSplits speeds up bzip2 decompression performance by 8.1x at most in current case.

taiyang-li avatar Jan 16 '24 05:01 taiyang-li

This is an automatic comment. The PR descriptions does not match the template.

Please, edit it accordingly.

The error is: Category 'Performance improvement' is not valid

clickhouse-ci[bot] avatar Jan 16 '24 12:01 clickhouse-ci[bot]

  1. Errors in performance tests seem not related to this PR: https://s3.amazonaws.com/clickhouse-test-reports/58743/e23744108cf35c91103d7e7d1c8e852b2d19c150/performance_comparison_[1_4]/report.html

image

  1. https://s3.amazonaws.com/clickhouse-test-reports/58743/e23744108cf35c91103d7e7d1c8e852b2d19c150/stateless_tests_flaky_check__asan_.html will be fixed.

  2. I don't known how to avoid below errors, need your help.. https://s3.amazonaws.com/clickhouse-test-reports/58743/e23744108cf35c91103d7e7d1c8e852b2d19c150/upgrade_check__debug_.html image

taiyang-li avatar Jan 23 '24 02:01 taiyang-li

https://s3.amazonaws.com/clickhouse-test-reports/58743/f5caa6809030ce71d88c5da6a41e666c3aafcc5f/stateless_tests_flaky_check__asan_.html

I'm confused about this failed test, need your help, @Algunenano : which log should I look at to solve it?

taiyang-li avatar Jan 31 '24 04:01 taiyang-li

Hope for your reviews, thanks very much!

taiyang-li avatar Feb 19 '24 02:02 taiyang-li

@alexey-milovidov I'm trying to use another parallel gz/xz codec implementation based on https://github.com/mxmlnkn/rapidgzip, which is different from current implementation in https://github.com/ClickHouse/ClickHouse/pull/36933. It is not an easy work to plant rapidgzip to CH.

Could you please review and merge this PR firstly? Thanks!

taiyang-li avatar Feb 26 '24 03:02 taiyang-li

@alexey-milovidov Hope for you reviews, thanks!

taiyang-li avatar Mar 15 '24 08:03 taiyang-li

@alexey-milovidov any comments? Thanks !

taiyang-li avatar Apr 07 '24 03:04 taiyang-li

Hi, @taiyang-li. I will be glad to review your PR. Could you, please, merge master and resolve conflicts to relaunch the CI system? Thanks in advance!

divanik avatar May 06 '24 11:05 divanik

@taiyang-li, could you explain, please, why we can't use bzip2 library and need our own implementation of bzip2 decompression in SplittableBzip2ReadBuffer.cpp?

divanik avatar May 06 '24 14:05 divanik

@taiyang-li, could you explain, please, why we can't use bzip2 library and need our own implementation of bzip2 decompression in SplittableBzip2ReadBuffer.cpp?

In hadoop, single bzip2 compressed file is splitted into serveral splits, and each task processes one split, which means Hadoop MR could process single bzip2 file in parallel, especially when the file is large. I implemented it in CH, expecting that it brings performance benefits for bzip decompression.

taiyang-li avatar May 10 '24 01:05 taiyang-li

In hadoop, single bzip2 compressed file is splitted into serveral splits, and each task processes one split, which means Hadoop MR could process single bzip2 file in parallel, especially when the file is large. I implemented it in CH, expecting that it brings performance benefits for bzip decompression.

Thank you for your response. I believe that this logic is too complex for our codebase. Could you please list all the reasons why we cannot use the code from bzip2 to solve this task? Perhaps we only need to add a small amount of code to the bzip2 library to resolve this issue?

divanik avatar May 14 '24 13:05 divanik

In hadoop, single bzip2 compressed file is splitted into serveral splits, and each task processes one split, which means Hadoop MR could process single bzip2 file in parallel, especially when the file is large. I implemented it in CH, expecting that it brings performance benefits for bzip decompression.

Thank you for your response. I believe that this logic is too complex for our codebase. Could you please list all the reasons why we cannot use the code from bzip2 to solve this task? Perhaps we only need to add a small amount of code to the bzip2 library to resolve this issue?

I thought about it before, but I found reusing bzip2 lib to implement splittable bzip2 read buffer is probably impossible. bzip2 lib was designed to read the whole file instead of single file split. And it maybe the reason why hadoop implement splittable bzip2 read buffer without original bzip2 lib.

taiyang-li avatar May 15 '24 02:05 taiyang-li

Dear @divanik, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

woolenwolfbot[bot] avatar Jun 25 '24 16:06 woolenwolfbot[bot]

@alexey-milovidov This PR has been blocked for a long time. Parallel bzip decompressor helps improve the performance when reading bzip2 files. Do you think it is possible to be merged into CH ? If not, I'll close it.

taiyang-li avatar Sep 04 '24 13:09 taiyang-li