meteor icon indicating copy to clipboard operation
meteor copied to clipboard

Meteor bigquery extractor runs for too long

Open bsushmith opened this issue 3 years ago • 0 comments

Is your feature request related to a problem? Please describe. We have a big query project where the total number of tables is in a couple of thousands. Whenever we run the big query meteor extractor on this project, we have seen that the meteor cron job takes more than 14hrs - even with the max_preview_rows set as zero and table/column profiling set as false.

I assume that this is mostly due to the way the datasets and tables are fetched iteratively one after the other in code.

Describe the solution you'd like Would like the extractor cron job to run and complete faster than the current time. I can see a couple of approaches to this at present:

  1. Fetch datasets and table information concurrently instead of one table after another, with a fixed number(configurable) of workers, or using pages
  2. Table metadata information could be fetched from information schema also - as detailed in this link - this might have impact on big query data processing costs. This also requires the BQ service account to have additional access to run jobs.

Additional context Version : latest main

bsushmith avatar Jul 29 '22 12:07 bsushmith