rocksdb icon indicating copy to clipboard operation
rocksdb copied to clipboard

Add support in CompactForTieringCollectorFactory to collect data write time

Open jowlyzhang opened this issue 8 months ago • 5 comments

This PR adds support in CompactForTieringCollectorFactory and CompactForTieringCollector to collect and aggregate data write time metric.

In order to do this, an internal API TableBuilder::SetSeqnoTimeForTrackingWriteTime is added to be called before table building starts, so that write time tracking per data entry is available while table is being built.

Two public API TablePropertiesCollector::AddUserKeyWithWriteTime(...) and TablePropertiesCollector::WriteTimeTrackingEnabled(...) are added to enable table builder passing per data entry write time to table property collectors.

This PR also include implementation for the util method GetDataCollectionUnixWriteTimeInfoForFile to create a DataCollectionUnixWriteTimeInfo from table properties.

A follow up is needed to aggregate DataCollectionUnixWriteTimeInfo, such as across all the files on a level.

Test plan: Added unit tests

jowlyzhang avatar Apr 23 '25 01:04 jowlyzhang

@jowlyzhang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot avatar Apr 23 '25 18:04 facebook-github-bot

@jowlyzhang has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot avatar Apr 28 '25 19:04 facebook-github-bot

I would like to see a SST building-heavy performance test between (a) no tracking before this change, (b) no tracking after this change, (c) with tracking.

I did some benchmarking with this test script:

for test_branch in a b c; do
    if [[ $test_branch == "c" ]];
    then TRACK_WRITE_TIME=1;
    else TRACK_WRITE_TIME=0;
    fi;  
    for I in `seq 1 20`; do
        TEST_TMPDIR=/tmp/rocksdb_test ./db_bench_$test_branch -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -memtablerep=vector -allow_concurrent_memtable_write=0 -bloom_bits=10 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -preserve_internal_time_seconds=600 -track_write_time_user_property=$TRACK_WRITE_TIME 2>&1 | grep micros/op;  
    done | awk '{ t += $5; c++; print } END { print 1.0 * t / c }';   
    echo "Test branch $test_branch -track_write_time_user_property=$TRACK_WRITE_TIME"; 
done

db_bench_a is built with the first commit(Adding configuration in CompactForTieringCollectorFactory) and the last commit(Test track data write time in db_bench). db_bench_b and db_bench_c are the same binary built with all the commits.

I ran the script for three rounds, and it shows consistently a minor regression from (a) to (b), and bigger regression from (b) to (c): Branch a Branch b Branch c 969343 vs. 966310 vs. 935374 (ops/second) (a -> b) -0.3% (b -> c) -3%

979631 vs. 972503 vs. 937697 (ops/second) (a -> b) -0.7% (b -> c) -3.5%

975044 vs. 969513 vs. 935618 (ops/second) (a -> b) -0.5% (b -> c) -3.5%

jowlyzhang avatar Apr 28 '25 20:04 jowlyzhang

I'd hate for everyone to pay a small tax for this feature, and a sizable one for those actually using it.

Can we re-work this so that the inner loop of compaction is unmodified? E.g. pass a time mapping on collector creation and let the collector do any mapping itself?

Also, what do we expect the statistics for aggregation to look like? Do we need to map each entry's time before inputting it into statistics or can we gather statistics and then map the results? Percentile data (e.g. boundary for each 10% of entries by age), for example, can be mapped afterwards with no loss of information.

pdillinger avatar Apr 29 '25 15:04 pdillinger

pass a time mapping on collector creation and let the collector do any mapping itself?

Do we need to map each entry's time before inputting it into statistics or can we gather statistics and then map the results?

Thanks for checking this and offering these improvement ideas. I think both of these are promising. I will try these ideas out. Let me put this PR back to draft state.

jowlyzhang avatar Apr 29 '25 16:04 jowlyzhang