systemds
systemds copied to clipboard
SYSTEMDS-3534 Cache for build phase in bin and recode encoder
We implemented a caffeine cache that is currently used by the bin and recode encoders. We tested and compared the runtime with and without caching in a seperate test class. We separate a first run of encoding which initializes the cache and takes longer from the rest of our measurements. We average over 10 runs for each setting.
Thanks for the changes, @Mayaryin and @ingunnaf. I will have a detailed look in the next days. From the quick look, it is not the right idea to use an external library for caching. We want to build our own caching and cache management techniques for encoder intermediates. That would allow for a holistic integration to SystemDS' compiler and runtime infrastructure.
Codecov Report
Attention: Patch coverage is 84.26396% with 31 lines in your changes missing coverage. Please review.
Please upload report for BASE (
main@f3e0926). Learn more about missing BASE report.
Additional details and impacted files
@@ Coverage Diff @@
## main #2036 +/- ##
=======================================
Coverage ? 68.84%
Complexity ? 40755
=======================================
Files ? 1447
Lines ? 161755
Branches ? 31432
=======================================
Hits ? 111363
Misses ? 41315
Partials ? 9077
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
We added a unit test and a tests for multithreading. We attempted to test the cache via a dml script and a test dataset, but couldnt succeed. Our test results from the inner tests as well as the test data generation script and the dml scripts can be found in this repository: https://github.com/Mayaryin/systemds_test_data
We added a unit test and a tests for multithreading. We attempted to test the cache via a dml script and a test dataset, but couldnt succeed. Our test results from the inner tests as well as the test data generation script and the dml scripts can be found in this repository: https://github.com/Mayaryin/systemds_test_data
Thanks @Mayaryin. I will take a look. Can you please summarize the performance comparison of single-threaded, multi-threaded, with and without cache in this PR to track it in the same place? You can maintain the scripts and setups in the other repository, but just mention the speedups here.
Also, please ensure that all tests pass.
I am unsure which tests exactly fail. When building locally the build fails due to javadoc warnings, this can be avoided by adding <failOnError>false</failOnError> in the javadoc plugin configuration in the POM. When looking at the warnings it seems that there are comments missing in many classes, but not restricted to the ones we added. Could we have a look at it together in a call this week?
I am unsure which tests exactly fail. When building locally the build fails due to javadoc warnings, this can be avoided by adding false in the javadoc plugin configuration in the POM. When looking at the warnings it seems that there are comments missing in many classes, but not restricted to the ones we added. Could we have a look at it together in a call this week?
Looks like the builtin.part2 is failing. You can see the back trace if you click Details. Take a look and try to reproduce the failure locally by running the same test. If still unsure, we can have a call later this week. Btw, I reran the tests to avoid any intermittent issues.
Test results
Testing only the build phase: averaging over 60 runs (columns) minus the first 10, 10000 rows cache is explicitly disabled/enabled RECODE no cache: 2,104 ms RECODE cache: 1,022 ms BIN no cache: 0,127 ms BIN cache: 0,0825 ms
Multithreading: averaging over 60 runs (columns) minus the first 10, 10000 rows
cache is not explicitly disabled, the execution times refer to the whole encode process including build and apply
I've seen that there is a problem with the dedup function but this also occurs in at least one other PR
All of the tests have passed now.
Thank you, @Mayaryin, @ingunnaf for your contribution. With some changes, I see a 2x speedup for real use cases, which is very good. I will not merge this PR immediately, as the changes are in the critical path of transformencode and may impact other running projects. However, after improving the robustness of this feature, I will merge it before the next release.
List of TODOs include:
- Integrate the lineage trace of the input frame into the key of the metadata cache, either just by adding the hash/checksum of the lineage trace or by making the build tasks lineage traceable. This extension will avoid incorrect reuse if the input frame is modified.
- The number of bins may need to be added to the key to avoid incorrect reuse for different number of bins
- Robustness of the hash function.
Future work outside the scope of this PR:
- Caching and reuse apply task results, which requires effective output allocation strategy.