cudf [BUG] Reduce peak memory usage for STRUCT decoding in parquet reader

Describe the bug In the libcudf benchmarks PARQUET_READER_NVBENCH, the STRUCT data type shows surprisingly high peak_memory_usage. For a 536 MB table, the INTEGRAL data type shows a 597 MiB peak memory usage. However, for the same 536 MB table size, the STRUCT data type shows 996 MiB peak memory usage. If there are good reasons for this difference, we can close the issue. Otherwise, we should reduce the extra memory overhead.

data_type	io_type	cardinality	run_length	Samples	CPU Time	Noise	GPU Time	Noise	bytes_per_second	peak_memory_usage	encoded_file_size
INTEGRAL	DEVICE_BUFFER	1000	32	33x	15.405 ms	0.30%	15.395 ms	0.29%	34872906834	597.127 MiB	14.403 MiB
FLOAT	DEVICE_BUFFER	1000	32	51x	9.827 ms	0.26%	9.818 ms	0.24%	54685058116	563.539 MiB	9.888 MiB
DECIMAL	DEVICE_BUFFER	1000	32	66x	7.701 ms	0.49%	7.691 ms	0.47%	69802302000	548.740 MiB	7.213 MiB
TIMESTAMP	DEVICE_BUFFER	1000	32	1152x	8.416 ms	3.03%	8.406 ms	3.03%	63866354457	556.717 MiB	8.719 MiB
DURATION	DEVICE_BUFFER	1000	32	1392x	7.919 ms	2.12%	7.909 ms	2.11%	67879410607	612.525 MiB	8.113 MiB
STRING	DEVICE_BUFFER	1000	32	928x	13.539 ms	1.62%	13.530 ms	1.62%	39678673862	669.530 MiB	8.504 MiB
LIST	DEVICE_BUFFER	1000	32	7x	72.190 ms	0.29%	72.180 ms	0.29%	7437971830	558.376 MiB	24.246 MiB
STRUCT	DEVICE_BUFFER	1000	32	13x	41.528 ms	0.14%	41.518 ms	0.14%	12930954541	996.277 MiB	15.399 MiB

Steps/Code to reproduce bug Here is an nvbench CLI command you can run to reproduce the above table:

./PARQUET_READER_NVBENCH --device 0 --benchmark 0 --axis cardinality=1000 --axis run_length=32

Expected behavior INTEGRAL and STRUCT<INTEGRAL> decode in the parquet reader should have a similar peak memory footprint.

Environment overview (please complete the following information)

docker image rapidsai/ci-conda:cuda12.1.1-ubuntu22.04-py3.11 pulled on 2024-02-03
cudf branch-24.02 and sha 6cebf2294ff

Additional context The chunked parquet reader seems to reduce the memory footprint from STRUCT decode, and the trend seems scaled to a higher footprint than other data types.

Feb 04 '24 18:02 GregoryKimball

There seems to be a bug in decode_page_data, which causes double allocation of the nested string column. Somehow two out_buf objects allocate string data based on the same src_col_index. This does not happen when there are two columns in the struct.

Feb 14 '24 23:02 vuule

Got a better understanding of the isolation info: the bug happens only when the string column is the first child of the second column. Seems like this case breaks the owning_schema logic. CC @nvdbaranec

Feb 15 '24 00:02 vuule

Opened https://github.com/rapidsai/cudf/pull/15061, which fixes the peak memory use in benchmarks (structs are now in line with the memory use of their nested types).

Feb 15 '24 01:02 vuule

Closed by #15061

Mar 04 '24 17:03 GregoryKimball