cudf
cudf copied to clipboard
[FEA] Add python bindings in the parquet reader for `num_rows`/`skiprows`
Is your feature request related to a problem? Please describe.
Unfortunately there has been churn in libcudf around support for num_rows/skiprows in the Parquet and ORC readers. In 22.08 we deprecated these parameters in the parquet reader (#11218) and then in 22.10 we removed them from C++ (#11503) and python (#11480). We also deprecated num_rows/skiprows in the ORC reader (#11522, see issue #11519).
At this point, we realized that chunked parquet reading (#11867) would require adding num_rows/skiprows back to the C++ implementation (#11657).
Let's stabilize row selection APIs in libcudf by completing these tasks:
- [ ] Add python bindings in the parquet reader for
num_rows/skiprows - [ ] Remove the deprecation notice in the ORC reader for
num_rows/skiprows(#11522)
Additional context
We also dropped num_rows/skiprows support in the cuDF-python fuzz tests (#11505). My preference is to not include any python fuzz testing changes in the scope of this issue.
Planning on implementing this as part of porting the parquet reader to pylibcudf