systemds icon indicating copy to clipboard operation
systemds copied to clipboard

SYSTEMDS-3534 Cache for build phase in bin and recode encoder

Open Mayaryin opened this issue 1 year ago • 11 comments

We implemented a caffeine cache that is currently used by the bin and recode encoders. We tested and compared the runtime with and without caching in a seperate test class. We separate a first run of encoding which initializes the cache and takes longer from the rest of our measurements. We average over 10 runs for each setting.

Mayaryin avatar Jun 25 '24 19:06 Mayaryin

Thanks for the changes, @Mayaryin and @ingunnaf. I will have a detailed look in the next days. From the quick look, it is not the right idea to use an external library for caching. We want to build our own caching and cache management techniques for encoder intermediates. That would allow for a holistic integration to SystemDS' compiler and runtime infrastructure.

phaniarnab avatar Jun 25 '24 21:06 phaniarnab

Codecov Report

Attention: Patch coverage is 84.26396% with 31 lines in your changes missing coverage. Please review.

Please upload report for BASE (main@f3e0926). Learn more about missing BASE report.

Files Patch % Lines
...sds/runtime/transform/encode/EncodeBuildCache.java 86.84% 7 Missing and 3 partials :warning:
.../sysds/runtime/transform/encode/BinBoundaries.java 68.42% 4 Missing and 2 partials :warning:
...sysds/runtime/transform/encode/EncodeCacheKey.java 70.00% 3 Missing and 3 partials :warning:
...sds/runtime/transform/encode/EncodeCacheEntry.java 66.66% 4 Missing and 1 partial :warning:
...sds/runtime/transform/encode/ColumnEncoderBin.java 91.66% 1 Missing and 1 partial :warning:
.../apache/sysds/runtime/matrix/data/MatrixBlock.java 0.00% 1 Missing :warning:
.../apache/sysds/runtime/transform/encode/RCDMap.java 85.71% 1 Missing :warning:
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2036   +/-   ##
=======================================
  Coverage        ?   68.84%           
  Complexity      ?    40755           
=======================================
  Files           ?     1447           
  Lines           ?   161755           
  Branches        ?    31432           
=======================================
  Hits            ?   111363           
  Misses          ?    41315           
  Partials        ?     9077           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Jul 04 '24 15:07 codecov-commenter

We added a unit test and a tests for multithreading. We attempted to test the cache via a dml script and a test dataset, but couldnt succeed. Our test results from the inner tests as well as the test data generation script and the dml scripts can be found in this repository: https://github.com/Mayaryin/systemds_test_data

Mayaryin avatar Jul 08 '24 20:07 Mayaryin

We added a unit test and a tests for multithreading. We attempted to test the cache via a dml script and a test dataset, but couldnt succeed. Our test results from the inner tests as well as the test data generation script and the dml scripts can be found in this repository: https://github.com/Mayaryin/systemds_test_data

Thanks @Mayaryin. I will take a look. Can you please summarize the performance comparison of single-threaded, multi-threaded, with and without cache in this PR to track it in the same place? You can maintain the scripts and setups in the other repository, but just mention the speedups here.

phaniarnab avatar Jul 08 '24 20:07 phaniarnab

Also, please ensure that all tests pass.

phaniarnab avatar Jul 08 '24 20:07 phaniarnab

I am unsure which tests exactly fail. When building locally the build fails due to javadoc warnings, this can be avoided by adding <failOnError>false</failOnError> in the javadoc plugin configuration in the POM. When looking at the warnings it seems that there are comments missing in many classes, but not restricted to the ones we added. Could we have a look at it together in a call this week?

Mayaryin avatar Jul 08 '24 21:07 Mayaryin

I am unsure which tests exactly fail. When building locally the build fails due to javadoc warnings, this can be avoided by adding false in the javadoc plugin configuration in the POM. When looking at the warnings it seems that there are comments missing in many classes, but not restricted to the ones we added. Could we have a look at it together in a call this week?

Looks like the builtin.part2 is failing. You can see the back trace if you click Details. Take a look and try to reproduce the failure locally by running the same test. If still unsure, we can have a call later this week. Btw, I reran the tests to avoid any intermittent issues.

phaniarnab avatar Jul 08 '24 21:07 phaniarnab

Test results

Testing only the build phase: averaging over 60 runs (columns) minus the first 10, 10000 rows cache is explicitly disabled/enabled RECODE no cache: 2,104 ms RECODE cache: 1,022 ms BIN no cache: 0,127 ms BIN cache: 0,0825 ms

Multithreading: averaging over 60 runs (columns) minus the first 10, 10000 rows cache is not explicitly disabled, the execution times refer to the whole encode process including build and apply image

Mayaryin avatar Jul 08 '24 21:07 Mayaryin

I've seen that there is a problem with the dedup function but this also occurs in at least one other PR

Mayaryin avatar Jul 08 '24 21:07 Mayaryin

All of the tests have passed now.

phaniarnab avatar Jul 08 '24 21:07 phaniarnab

Thank you, @Mayaryin, @ingunnaf for your contribution. With some changes, I see a 2x speedup for real use cases, which is very good. I will not merge this PR immediately, as the changes are in the critical path of transformencode and may impact other running projects. However, after improving the robustness of this feature, I will merge it before the next release.

List of TODOs include:

  • Integrate the lineage trace of the input frame into the key of the metadata cache, either just by adding the hash/checksum of the lineage trace or by making the build tasks lineage traceable. This extension will avoid incorrect reuse if the input frame is modified.
  • The number of bins may need to be added to the key to avoid incorrect reuse for different number of bins
  • Robustness of the hash function.

Future work outside the scope of this PR:

  • Caching and reuse apply task results, which requires effective output allocation strategy.

phaniarnab avatar Jul 21 '24 13:07 phaniarnab