seafowl
seafowl copied to clipboard
Optimise reading from local files
Currently we don't handle loading local parquet files well, as it seems the plan loads the entire file into memory prior to re-partitioning and uploading to object store.
For example (using area1.parquet from here, 2.45GB size):
CREATE EXTERNAL TABLE area1 STORED AS PARQUET LOCATION '/Users/markogrujic/Downloads/area1.parquet';
CREATE TABLE area1 AS SELECT * FROM staging.area1;
leads to the following memory profile (the first plateau corresponds to reading from the file itself)