graphstorm
graphstorm copied to clipboard
[Draft][GSProcessing] Improve Spark config to better support EMR/EMRS, small optimizations
Issue #, if available:
Description of changes:
- We change the way we configure the Spark env, to only create our own config for SageMaker, as EMR/EMRS will have pre-configured defaults.
- We introduce some enum classes to better communicate execution environment and filesystem type
- We try to cache DFs that are used multiple times to avoid re-computation
- Do not use
:in the data paths, as it can cause errors when moving the data to local filesystems.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.