cudf icon indicating copy to clipboard operation
cudf copied to clipboard

Reducing runtime of JSON reader options benchmark

Open shrshi opened this issue 1 year ago • 0 comments

Description

This PR cleans up the JSON reader options benchmark by reducing the number of runtime configurations from 162 to 20. Reasoning behind the splitting of the benchmark -

  1. The normalize_single_quotes and normalize_whitespace are pre-processing options and do not impact each other - the runtimes of the FSTs are additive.
  2. The performance of raw input ingestion (row_selection::ALL and row_selection::BYTE_RANGE) is independent of the token generation and tree algorithms.

Checklist

  • [X] I am familiar with the Contributing Guidelines.
  • [ ] New or existing tests cover these changes.
  • [ ] The documentation is up to date with these changes.

shrshi avatar May 07 '24 00:05 shrshi