arthur-redshift-etl
arthur-redshift-etl copied to clipboard
Extract parallelism might exceed capacity and fail
Extracting with Sqoop, code must be generated and compiled each time an extract_table
attempt is made.
With a thread per database source calling Sqoop, you may not have enough memory on the master node to do all this codegen/compiling. The Sqoop calls will fail, though they would work if re-tried with lower parallelism.