seafowl icon indicating copy to clipboard operation
seafowl copied to clipboard

Optimise reading from local files

Open gruuya opened this issue 1 year ago • 0 comments

Currently we don't handle loading local parquet files well, as it seems the plan loads the entire file into memory prior to re-partitioning and uploading to object store.

For example (using area1.parquet from here, 2.45GB size):

CREATE EXTERNAL TABLE area1 STORED AS PARQUET LOCATION '/Users/markogrujic/Downloads/area1.parquet';
CREATE TABLE area1 AS SELECT * FROM staging.area1;

leads to the following memory profile (the first plateau corresponds to reading from the file itself) image

gruuya avatar Sep 08 '22 11:09 gruuya