[Prototype] Rapids spilling manager
This is part of the effort to implement seamlessly spilling in cuDF and is just for testing for now.
The idea is to have a new column accessor, SpillableColumnAccessor, that can serialize and deserialize its columns in-place and a new manager that order column serializations triggered by rmm.mr.FailureCallbackResourceAdaptor.
As a demonstration, I have included python/cudf/cudf/spilling-demo.py that continues to allocate random dataframes until running out of device memory at which point spilling are triggered.
Output of running `spilling-demo.py` on my workstation
$ python python/cudf/cudf/spilling-demo.py
Initial state - device: 1.918 GB, host: 1.154 GB
[ 0] dataframes: 2.235 GB, device: 6.437 GB, host: 1.168 GB
[ 1] dataframes: 4.470 GB, device: 8.675 GB, host: 1.168 GB
[ 2] dataframes: 6.706 GB, device: 10.911 GB, host: 1.168 GB
[ 3] dataframes: 8.941 GB, device: 13.148 GB, host: 1.168 GB
[ 4] dataframes: 11.176 GB, device: 15.395 GB, host: 1.168 GB
[ 5] dataframes: 13.411 GB, device: 17.633 GB, host: 1.168 GB
[ 6] dataframes: 15.646 GB, device: 19.871 GB, host: 1.168 GB
[ 7] dataframes: 17.881 GB, device: 22.109 GB, host: 1.168 GB
[ 8] dataframes: 20.117 GB, device: 24.348 GB, host: 1.168 GB
[ 9] dataframes: 22.352 GB, device: 26.586 GB, host: 1.168 GB
[10] dataframes: 24.587 GB, device: 28.824 GB, host: 1.169 GB
[11] dataframes: 26.822 GB, device: 31.062 GB, host: 1.169 GB
[12] dataframes: 29.057 GB, device: 29.562 GB, host: 4.883 GB
[13] dataframes: 31.292 GB, device: 29.562 GB, host: 7.118 GB
[14] dataframes: 33.528 GB, device: 29.562 GB, host: 9.353 GB
Spill all device memory
Spilling column: 0.745 GB, device: 28.815 GB, host: 10.098 GB
Spilling column: 0.745 GB, device: 28.069 GB, host: 10.843 GB
Spilling column: 0.745 GB, device: 27.332 GB, host: 11.588 GB
Spilling column: 0.745 GB, device: 26.582 GB, host: 12.333 GB
Spilling column: 0.745 GB, device: 25.836 GB, host: 13.078 GB
Spilling column: 0.745 GB, device: 25.090 GB, host: 13.823 GB
Spilling column: 0.745 GB, device: 24.344 GB, host: 14.568 GB
Spilling column: 0.745 GB, device: 23.605 GB, host: 15.313 GB
Spilling column: 0.745 GB, device: 22.859 GB, host: 16.058 GB
Spilling column: 0.745 GB, device: 22.109 GB, host: 16.804 GB
Spilling column: 0.745 GB, device: 21.363 GB, host: 17.549 GB
Spilling column: 0.745 GB, device: 20.621 GB, host: 18.294 GB
Spilling column: 0.745 GB, device: 19.875 GB, host: 19.039 GB
Spilling column: 0.745 GB, device: 19.125 GB, host: 19.784 GB
Spilling column: 0.745 GB, device: 18.379 GB, host: 20.529 GB
Spilling column: 0.745 GB, device: 17.633 GB, host: 21.274 GB
Spilling column: 0.745 GB, device: 16.891 GB, host: 22.019 GB
Spilling column: 0.745 GB, device: 16.145 GB, host: 22.764 GB
Spilling column: 0.745 GB, device: 15.398 GB, host: 23.509 GB
Spilling column: 0.745 GB, device: 14.652 GB, host: 24.254 GB
Spilling column: 0.745 GB, device: 13.906 GB, host: 24.999 GB
Spilling column: 0.745 GB, device: 13.156 GB, host: 25.744 GB
Spilling column: 0.745 GB, device: 12.414 GB, host: 26.489 GB
Spilling column: 0.745 GB, device: 11.668 GB, host: 27.235 GB
Spilling column: 0.745 GB, device: 10.922 GB, host: 27.979 GB
Spilling column: 0.745 GB, device: 10.176 GB, host: 28.724 GB
Spilling column: 0.745 GB, device: 9.426 GB, host: 29.470 GB
Spilling column: 0.745 GB, device: 8.680 GB, host: 30.215 GB
Spilling column: 0.745 GB, device: 7.934 GB, host: 30.960 GB
Spilling column: 0.745 GB, device: 7.188 GB, host: 31.705 GB
Spilling column: 0.745 GB, device: 6.441 GB, host: 32.450 GB
Spilling column: 0.745 GB, device: 5.699 GB, host: 33.195 GB
Spilling column: 0.745 GB, device: 4.949 GB, host: 33.940 GB
Spilling column: 0.745 GB, device: 4.203 GB, host: 34.685 GB
Spilling column: 0.745 GB, device: 3.461 GB, host: 35.430 GB
Spilling column: 0.745 GB, device: 2.715 GB, host: 36.175 GB
Spilling column: 0.745 GB, device: 1.969 GB, host: 36.920 GB
Finished spilling - device: 1.969 GB, host: 36.920 GB
Access spilled dataframes
[ 0] dataframe access, device: 8.680 GB, host: 32.450 GB
[ 1] dataframe access, device: 10.918 GB, host: 30.215 GB
[ 2] dataframe access, device: 13.156 GB, host: 27.980 GB
[ 3] dataframe access, device: 15.399 GB, host: 25.744 GB
[ 4] dataframe access, device: 17.634 GB, host: 23.509 GB
[ 5] dataframe access, device: 19.872 GB, host: 21.274 GB
[ 6] dataframe access, device: 22.096 GB, host: 19.039 GB
[ 7] dataframe access, device: 24.318 GB, host: 16.804 GB
[ 8] dataframe access, device: 26.563 GB, host: 14.569 GB
[ 9] dataframe access, device: 28.805 GB, host: 12.333 GB
[10] dataframe access, device: 29.552 GB, host: 11.588 GB
[11] dataframe access, device: 29.552 GB, host: 11.588 GB
[12] dataframe access, device: 29.547 GB, host: 11.588 GB
[13] dataframe access, device: 29.543 GB, host: 11.588 GB
[14] dataframe access, device: 29.543 GB, host: 11.588 GB
Deleting dataframes - device: 1.937 GB, host: 1.157 GB
Initial/end state delta - device: 0.019531 GB, host: 0.003410 GB
cc. @shwina @quasiben
Codecov Report
:exclamation: No coverage uploaded for pull request base (
branch-22.08@bad00d7). Click here to learn what that means. The diff coverage isn/a.
:exclamation: Current head f7d42ec differs from pull request most recent head d1317a6. Consider uploading reports for the commit d1317a6 to get more accurate results
@@ Coverage Diff @@
## branch-22.08 #10746 +/- ##
===============================================
Coverage ? 85.90%
===============================================
Files ? 147
Lines ? 23123
Branches ? 0
===============================================
Hits ? 19864
Misses ? 3259
Partials ? 0
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update bad00d7...d1317a6. Read the comment docs.
Is this dependent on the hacks @shwina did in https://github.com/rapidsai/cudf/pull/10592 or did we find some way around needing to do that?
Sync'd offline with Jake, but for completeness: It's not independent. We're going to have to incorporate those changes here eventually. This is not ready for review.
This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.
Closed in favor of https://github.com/rapidsai/cudf/pull/11553