latitude icon indicating copy to clipboard operation
latitude copied to clipboard

Write to parquet file a postgresql query

Open andresgutgon opened this issue 9 months ago • 1 comments

Describe your changes

We want to allow users to cache queries more permanently. We're doing persistent into parquet files. In this PR we introduce the functionality for writing parquet files to the Source Manager. We also introduce the concept of batched queries in our connectors. This means now they can pull all the data from the query in a way that doesn't exhaust the running machine's memory. This can happen with huge queries

TODO

  • [x] Make sure the connector fails if it has not implemented batchQuery method
  • [x] Implement batchQuery method in postgresql connector.
  • [x] Infer query schema. This is needed to build the parquet file.
  • [x] Write each batch of rows from the query to a parquet file
  • [x] Find a lib that works to write parquetjs 💀. Finally, I picked @dsnp/parquetjs

andresgutgon avatar May 16 '24 13:05 andresgutgon