cudf icon indicating copy to clipboard operation
cudf copied to clipboard

[DOC] low-memory reader options not very discoverable

Open wence- opened this issue 1 year ago • 3 comments

Recently, we added chunked (low-memory) readers in cudf-python for parquet and json formats.

The only place these features are documented are in the options values that globally select whether to use the chunked reader. These options are, respectively io.parquet.low_memory and io.json.low_memory.

These are shown (in an unformatted manner) as the output of describe_options in the user documentation as part of the description of options: https://docs.rapids.ai/api/cudf/nightly/user_guide/api_docs/options/#api-options

If I were looking for information about how to control IO memory usage, I do not think that I would think to look here.

I would suggest that:

  • chunked reader control is mentioned in the relevant read_parquet and read_json docstrings. This is especially important because there is no keyword argument to control the behaviour, it is only controlled through the option.
  • these settings are mentioned in the I/O overview documentation (somewhere here https://docs.rapids.ai/api/cudf/nightly/user_guide/io/)

wence- avatar Jul 31 '24 09:07 wence-

@galipremsagar Could you take this on?

bdice avatar Jul 31 '24 12:07 bdice

@galipremsagar Could you take this on?

Sure

galipremsagar avatar Jul 31 '24 12:07 galipremsagar

It might also be good to have a high-level user guide indicating how to user cudf in low memory situations. That would include the I/O options as well as things like switching to a managed memory allocator or tips and tricks for cleaning up intermediate objects to reduce how many allocations stick around.

vyasr avatar Aug 16 '24 19:08 vyasr