cudf
cudf copied to clipboard
[DOC] low-memory reader options not very discoverable
Recently, we added chunked (low-memory) readers in cudf-python for parquet and json formats.
The only place these features are documented are in the options values that globally select whether to use the chunked reader. These options are, respectively io.parquet.low_memory and io.json.low_memory.
These are shown (in an unformatted manner) as the output of describe_options in the user documentation as part of the description of options: https://docs.rapids.ai/api/cudf/nightly/user_guide/api_docs/options/#api-options
If I were looking for information about how to control IO memory usage, I do not think that I would think to look here.
I would suggest that:
- chunked reader control is mentioned in the relevant
read_parquetandread_jsondocstrings. This is especially important because there is no keyword argument to control the behaviour, it is only controlled through the option. - these settings are mentioned in the I/O overview documentation (somewhere here https://docs.rapids.ai/api/cudf/nightly/user_guide/io/)
@galipremsagar Could you take this on?
@galipremsagar Could you take this on?
Sure
It might also be good to have a high-level user guide indicating how to user cudf in low memory situations. That would include the I/O options as well as things like switching to a managed memory allocator or tips and tricks for cleaning up intermediate objects to reduce how many allocations stick around.