Increase multibyte_split performance by improving copy semantics.

Open cwharris opened this issue 2 years ago • 2 comments

reduces unnecessary copying by moving copy logic in to the data_chunk_reader API, introducing a new method copy_next_chunk, which on a per-implementation-basis can decide the best way to copy the underlying data source in to the required specific destination. This API is design to allow single-copy operations where using get_next_chunk may otherwise perform a copy under the hood before the data reaches it's final destination. Consumers should prefer get_next_chunk when the location of the data is not important, allowing device reads to take place in a zero-copy manner.

Thanks @ttnghia for the reminder. :)

before: before after: after

Jul 26 '22 19:07 cwharris

Codecov Report

:exclamation: No coverage uploaded for pull request base (branch-22.10@6d38240). Click here to learn what that means. The diff coverage is n/a.

:exclamation: Current head 0a9cfd4 differs from pull request most recent head fd5f28c. Consider uploading reports for the commit fd5f28c to get more accurate results

@@               Coverage Diff               @@
##             branch-22.10   #11361   +/-   ##
===============================================
  Coverage                ?   86.43%           
===============================================
  Files                   ?      144           
  Lines                   ?    22808           
  Branches                ?        0           
===============================================
  Hits                    ?    19714           
  Misses                  ?     3094           
  Partials                ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 6d38240...fd5f28c. Read the comment docs.

Jul 26 '22 22:07 codecov[bot]

Rerun tests.

Jul 28 '22 21:07 ttnghia

@cwharris as this no longer applies, do you think the semantics would be useful elsewhere?

Aug 31 '22 15:08 upsj

@upsj yeah, this is something I wanted to add to the data chunk reader from the beginning, I just didn't have a use case for it. Anywhere we need to copy bytes in to a specific location, this is the right API. Otherwise we could be performing an extra copy. That said it has no place in the recently merged multibyte_split implementation.

Aug 31 '22 15:08 cwharris

cudf cudf copied to clipboard

Increase multibyte_split performance by improving copy semantics.

Codecov Report

cudf
cudf copied to clipboard