root icon indicating copy to clipboard operation
root copied to clipboard

[ntuple] Implement unbuffered parallel writing

Open hahnjo opened this issue 1 year ago • 3 comments

Instead of using one RPageSinkBuf per context, implement a synchronizing page sink that compresses pages and writes them through to storage, but only commits them when the context's cluster is ready. This uses much less memory, but results in higher lock contention and very fragmented files.


We likely don't want to merge this because buffered writing offers better scalability and allows to reorder pages, resulting in better read performance. But for future reference, this is how it could be implemented.

hahnjo avatar Mar 12 '24 10:03 hahnjo

Starting build on ROOT-performance-centos8-multicore/soversion, ROOT-ubuntu2204/nortcxxmod, ROOT-ubuntu2004/python3, mac12arm/cxx20, windows10/default How to customize builds

phsft-bot avatar Mar 12 '24 10:03 phsft-bot

Test Results

     9 files       9 suites   1d 16h 47m 35s :stopwatch:  2 634 tests  2 633 :white_check_mark: 0 :zzz: 1 :x: 22 331 runs  22 330 :white_check_mark: 0 :zzz: 1 :x:

For more details on these failures, see this check.

Results for commit 9ad6150b.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Mar 12 '24 13:03 github-actions[bot]

Of note, this has a reverse conflict with https://github.com/root-project/root/pull/15239 which currently documents that parallel writing is always buffered

hahnjo avatar May 07 '24 12:05 hahnjo

As discussed and mentioned before, we will require buffered writing with the RNTupleParallelWriter because of its better scalability and less fragmented output files.

hahnjo avatar Jul 29 '24 09:07 hahnjo