root icon indicating copy to clipboard operation
root copied to clipboard

[ntuple] Adaptive page sizes

Open jblomer opened this issue 6 months ago • 1 comments

Moves from fixed page sizes on write to adaptive page sizes, following the original idea of @hahnjo

The new mechanism is explained in the tuning.md document in the PR.

The PR also bumps the target compressed cluster size to 150MB. We may want to reduce that still. Evaluation of the new method is currently ongoing and the PR description will be amended with the results.

EDIT: Comparison of current write performance vs adaptive page sizes with 50MB, 100MB, 150MB target cluster size. To me it seems that there is not a good argument to go to 150 MB clusters. There may be an argument for 100 MB clusters. For the moment, I'll remove the commit that changes the default settings from the PR.

An additional flavor, adaptive / exp, is included in the table to test the effect of flushing foreign columns. In the experimental mode, columns only flush themselves, which simplifies the RWritePageMemoryManager and avoids the upcall from the sink to the column. There is a small positive effect of foreign flushes on the file size in the nanoAOD sample. The effect is more visible for the number of pages. The memory consumption is slightly smaller without foreign column flushes.

I'll see if I can construct an example that shows better the advantage of foreign column flushes (or not).

As expected, the memory savings become visible for large EDMs (e.g., nanoAOD in this set of samples).

jblomer avatar Aug 26 '24 12:08 jblomer