arthur-redshift-etl icon indicating copy to clipboard operation
arthur-redshift-etl copied to clipboard

Extract parallelism might exceed capacity and fail

Open bhtucker opened this issue 7 years ago • 1 comments

Extracting with Sqoop, code must be generated and compiled each time an extract_table attempt is made.

With a thread per database source calling Sqoop, you may not have enough memory on the master node to do all this codegen/compiling. The Sqoop calls will fail, though they would work if re-tried with lower parallelism.

bhtucker avatar Dec 20 '17 18:12 bhtucker