fhir-data-pipes
fhir-data-pipes copied to clipboard
Enable the pipelines for Flink non-local execution modes as well
Currently the pipelines are validated and test for Flink local execution modes, validate and make necessary changes for the non-local execution modes as well.
The following improvements have been made for the Flink local execution mode
- Auto generate Flink configuration file with appropriate values configured under it. The parameters are determined to the best effort basis so that the pipelines does not fail even for high loads. Refer here for details.
- The number of threads (parallelism) are defaulted to the cores in the machine, but can be overridden over here. In local mode, by default only one worker gets created per pipeline and the parallelism is achieved by the same worker. However, in non-local mode the cluster can distribute the load across workers(Taskmanagers) to achieve the needed parallelism.
- The parquet row group sizes are made configurable, so that the pipeline does not consume much Heap memory, changes can be found here.
Since for the non-local execution mode the resources are little abundant, these properties can be fine tuned for it. There are might be few changes that are needed to suit the needs of the cluster.