fhir-data-pipes
fhir-data-pipes copied to clipboard
Deploying Spark fails with insufficient max JVM memory
I was testing a simple developer setup for pipelines:
cd ~/gits
git clone https://github.com/google/fhir-data-pipes.git
cd fhir-data-pipes/
chmod a+rx docker/dwh
cd docker/
docker network create cloudbuild
docker-compose -f hapi-compose.yml up --force-recreate -d
curl -H "Content-Type: application/json; charset=utf-8" 'http://localhost:8091/fhir/Patient' -v
cd ..
python3 -m venv venv
venv/bin/pip install -r ./synthea-hiv/uploader/requirements.txt
venv/bin/python3 synthea-hiv/uploader/main.py HAPI http://localhost:8091/fhir --input_dir ./synthea-hiv/sample_data_small/
This finished with some errors which I ignored: https://paste.googleplex.com/5983969268465664
docker-compose -f docker/compose-controller-spark-sql-single.yaml up --force-recreate
Finished with error due to JVM max memory: https://paste.googleplex.com/4827520718864384
Fixed by editing docker/.env JAVA_OPTS=-Xms10g -Xmx10g
Should we make this change permanent?
Yes this is a known issue but we prefer not to increase the default memory significantly. To see the reasoning, please take a look at where JAVA_OPTS is set here and the comments above it. So I am guessing that you are running this on a machine with a lot of cores (how many?). We cannot change JVM memory configuration in the controller code but we can check in advance and fail early if we know that the big number of threads is going to fail the pipeline later on.
Another option is to limit the number of threads dynamically, i.e., even if numThreads is set to use all cores (e.g., here) we can override it and set it to a smaller number where the provided memory is enough. That way the pipeline won't fail but may suffer performance-wise; @chandrashekar-s WDYT about this capping change?
Yes this is a known issue but we prefer not to increase the default memory significantly. To see the reasoning, please take a look at where
JAVA_OPTSis set here and the comments above it. So I am guessing that you are running this on a machine with a lot of cores (how many?). We cannot change JVM memory configuration in thecontrollercode but we can check in advance and fail early if we know that the big number of threads is going to fail the pipeline later on.Another option is to limit the number of threads dynamically, i.e., even if
numThreadsis set to use all cores (e.g., here) we can override it and set it to a smaller number where the provided memory is enough. That way the pipeline won't fail but may suffer performance-wise; @chandrashekar-s WDYT about this capping change?
This is a good idea to dynamically set the number of cores, if the configured memory is insufficient and when the numThreads is set to use all cores (or say optimum value). This way we don't fail the application, but we should also warn the user about the same in the logs.