graphstorm icon indicating copy to clipboard operation
graphstorm copied to clipboard

[Draft][GSProcessing] Improve Spark config to better support EMR/EMRS, small optimizations

Open thvasilo opened this issue 1 year ago • 0 comments

Issue #, if available:

Description of changes:

  • We change the way we configure the Spark env, to only create our own config for SageMaker, as EMR/EMRS will have pre-configured defaults.
  • We introduce some enum classes to better communicate execution environment and filesystem type
  • We try to cache DFs that are used multiple times to avoid re-computation
  • Do not use : in the data paths, as it can cause errors when moving the data to local filesystems.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

thvasilo avatar May 14 '24 23:05 thvasilo