dbt-duckdb icon indicating copy to clipboard operation
dbt-duckdb copied to clipboard

When using external JSON materalization: bumping into default maximum_object_size limit.

Open firewall413 opened this issue 1 year ago • 1 comments

https://github.com/duckdb/dbt-duckdb/blob/ab16970ba9f616205dcae52a9dcb661c6d8836c6/dbt/include/duckdb/macros/materializations/external.sql#L52

When materializing a table to a JSON file bigger than 30MB, we bump into the following:

Invalid Input Error: "maximum_object_size" of 16777216 bytes exceeded while reading file "s3://xxxxxx.json" (>33554428 bytes). Try increasing "maximum_object_size".

This is likely due to the select * from '{{ read_location }}' trying to build a view with the default read_json_auto() and default options params.

Would it be possible to pass the read_json/read_parquet/read_csv functions and their options params?

firewall413 avatar Jun 26 '24 14:06 firewall413

Yes, I think; there would need to be a PR that modified this function to let you override more of the defaults using the rendered_options dictionary (like we do for external materializations that use partitioning): https://github.com/duckdb/dbt-duckdb/blob/master/dbt/adapters/duckdb/impl.py#L166

jwills avatar Jun 26 '24 16:06 jwills