cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[Prototype] Rapids spilling manager

Open madsbk opened this issue 3 years ago • 4 comments

This is part of the effort to implement seamlessly spilling in cuDF and is just for testing for now.

The idea is to have a new column accessor, SpillableColumnAccessor, that can serialize and deserialize its columns in-place and a new manager that order column serializations triggered by rmm.mr.FailureCallbackResourceAdaptor.

As a demonstration, I have included python/cudf/cudf/spilling-demo.py that continues to allocate random dataframes until running out of device memory at which point spilling are triggered.

Output of running `spilling-demo.py` on my workstation
$ python python/cudf/cudf/spilling-demo.py 
Initial state - device:  1.918 GB, host:  1.154 GB
[ 0] dataframes:  2.235 GB, device:  6.437 GB, host:  1.168 GB
[ 1] dataframes:  4.470 GB, device:  8.675 GB, host:  1.168 GB
[ 2] dataframes:  6.706 GB, device: 10.911 GB, host:  1.168 GB
[ 3] dataframes:  8.941 GB, device: 13.148 GB, host:  1.168 GB
[ 4] dataframes: 11.176 GB, device: 15.395 GB, host:  1.168 GB
[ 5] dataframes: 13.411 GB, device: 17.633 GB, host:  1.168 GB
[ 6] dataframes: 15.646 GB, device: 19.871 GB, host:  1.168 GB
[ 7] dataframes: 17.881 GB, device: 22.109 GB, host:  1.168 GB
[ 8] dataframes: 20.117 GB, device: 24.348 GB, host:  1.168 GB
[ 9] dataframes: 22.352 GB, device: 26.586 GB, host:  1.168 GB
[10] dataframes: 24.587 GB, device: 28.824 GB, host:  1.169 GB
[11] dataframes: 26.822 GB, device: 31.062 GB, host:  1.169 GB
[12] dataframes: 29.057 GB, device: 29.562 GB, host:  4.883 GB
[13] dataframes: 31.292 GB, device: 29.562 GB, host:  7.118 GB
[14] dataframes: 33.528 GB, device: 29.562 GB, host:  9.353 GB
Spill all device memory
Spilling column: 0.745 GB, device: 28.815 GB, host: 10.098 GB
Spilling column: 0.745 GB, device: 28.069 GB, host: 10.843 GB
Spilling column: 0.745 GB, device: 27.332 GB, host: 11.588 GB
Spilling column: 0.745 GB, device: 26.582 GB, host: 12.333 GB
Spilling column: 0.745 GB, device: 25.836 GB, host: 13.078 GB
Spilling column: 0.745 GB, device: 25.090 GB, host: 13.823 GB
Spilling column: 0.745 GB, device: 24.344 GB, host: 14.568 GB
Spilling column: 0.745 GB, device: 23.605 GB, host: 15.313 GB
Spilling column: 0.745 GB, device: 22.859 GB, host: 16.058 GB
Spilling column: 0.745 GB, device: 22.109 GB, host: 16.804 GB
Spilling column: 0.745 GB, device: 21.363 GB, host: 17.549 GB
Spilling column: 0.745 GB, device: 20.621 GB, host: 18.294 GB
Spilling column: 0.745 GB, device: 19.875 GB, host: 19.039 GB
Spilling column: 0.745 GB, device: 19.125 GB, host: 19.784 GB
Spilling column: 0.745 GB, device: 18.379 GB, host: 20.529 GB
Spilling column: 0.745 GB, device: 17.633 GB, host: 21.274 GB
Spilling column: 0.745 GB, device: 16.891 GB, host: 22.019 GB
Spilling column: 0.745 GB, device: 16.145 GB, host: 22.764 GB
Spilling column: 0.745 GB, device: 15.398 GB, host: 23.509 GB
Spilling column: 0.745 GB, device: 14.652 GB, host: 24.254 GB
Spilling column: 0.745 GB, device: 13.906 GB, host: 24.999 GB
Spilling column: 0.745 GB, device: 13.156 GB, host: 25.744 GB
Spilling column: 0.745 GB, device: 12.414 GB, host: 26.489 GB
Spilling column: 0.745 GB, device: 11.668 GB, host: 27.235 GB
Spilling column: 0.745 GB, device: 10.922 GB, host: 27.979 GB
Spilling column: 0.745 GB, device: 10.176 GB, host: 28.724 GB
Spilling column: 0.745 GB, device:  9.426 GB, host: 29.470 GB
Spilling column: 0.745 GB, device:  8.680 GB, host: 30.215 GB
Spilling column: 0.745 GB, device:  7.934 GB, host: 30.960 GB
Spilling column: 0.745 GB, device:  7.188 GB, host: 31.705 GB
Spilling column: 0.745 GB, device:  6.441 GB, host: 32.450 GB
Spilling column: 0.745 GB, device:  5.699 GB, host: 33.195 GB
Spilling column: 0.745 GB, device:  4.949 GB, host: 33.940 GB
Spilling column: 0.745 GB, device:  4.203 GB, host: 34.685 GB
Spilling column: 0.745 GB, device:  3.461 GB, host: 35.430 GB
Spilling column: 0.745 GB, device:  2.715 GB, host: 36.175 GB
Spilling column: 0.745 GB, device:  1.969 GB, host: 36.920 GB
Finished spilling - device:  1.969 GB, host: 36.920 GB
Access spilled dataframes
[ 0] dataframe access, device:  8.680 GB, host: 32.450 GB
[ 1] dataframe access, device: 10.918 GB, host: 30.215 GB
[ 2] dataframe access, device: 13.156 GB, host: 27.980 GB
[ 3] dataframe access, device: 15.399 GB, host: 25.744 GB
[ 4] dataframe access, device: 17.634 GB, host: 23.509 GB
[ 5] dataframe access, device: 19.872 GB, host: 21.274 GB
[ 6] dataframe access, device: 22.096 GB, host: 19.039 GB
[ 7] dataframe access, device: 24.318 GB, host: 16.804 GB
[ 8] dataframe access, device: 26.563 GB, host: 14.569 GB
[ 9] dataframe access, device: 28.805 GB, host: 12.333 GB
[10] dataframe access, device: 29.552 GB, host: 11.588 GB
[11] dataframe access, device: 29.552 GB, host: 11.588 GB
[12] dataframe access, device: 29.547 GB, host: 11.588 GB
[13] dataframe access, device: 29.543 GB, host: 11.588 GB
[14] dataframe access, device: 29.543 GB, host: 11.588 GB
Deleting dataframes - device: 1.937 GB, host: 1.157 GB
Initial/end state delta - device: 0.019531 GB, host: 0.003410 GB

cc. @shwina @quasiben

madsbk avatar Apr 27 '22 11:04 madsbk

Codecov Report

:exclamation: No coverage uploaded for pull request base (branch-22.08@bad00d7). Click here to learn what that means. The diff coverage is n/a.

:exclamation: Current head f7d42ec differs from pull request most recent head d1317a6. Consider uploading reports for the commit d1317a6 to get more accurate results

@@               Coverage Diff               @@
##             branch-22.08   #10746   +/-   ##
===============================================
  Coverage                ?   85.90%           
===============================================
  Files                   ?      147           
  Lines                   ?    23123           
  Branches                ?        0           
===============================================
  Hits                    ?    19864           
  Misses                  ?     3259           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update bad00d7...d1317a6. Read the comment docs.

codecov[bot] avatar Apr 27 '22 13:04 codecov[bot]

Is this dependent on the hacks @shwina did in https://github.com/rapidsai/cudf/pull/10592 or did we find some way around needing to do that?

jrhemstad avatar Apr 27 '22 15:04 jrhemstad

Sync'd offline with Jake, but for completeness: It's not independent. We're going to have to incorporate those changes here eventually. This is not ready for review.

shwina avatar Apr 27 '22 15:04 shwina

This PR has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this PR if it is no longer required. Otherwise, please respond with a comment indicating any updates. This PR will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] avatar Aug 07 '22 15:08 github-actions[bot]

Closed in favor of https://github.com/rapidsai/cudf/pull/11553

madsbk avatar Oct 03 '22 11:10 madsbk