TransmogrifAI icon indicating copy to clipboard operation
TransmogrifAI copied to clipboard

Factor out all common spark/hadoop properties

Open gerashegalov opened this issue 7 years ago • 8 comments

Problem We currently largely overlapping spark.gradle files especially in terms of spark properties.

$ git ls-files | grep spark.gradle
gradle/spark.gradle
helloworld/gradle/spark.gradle
templates/simple/spark.gradle

Solution Provide a way to have a single spark.gradle or at least a single spark-transmogrifai.conf file with common properties that is passed via --properties-file to Spark.

Alternatives

  • common properties file
  • refactored spark.gradle

Additional context DRY

gerashegalov avatar Aug 24 '18 18:08 gerashegalov

Hey @gerashegalov I'm guessing this issue is still concern since helloworld/gradle/spark.gradle and templates/simple/spark.gradle are duplicates of gradle/spark.gradle but are still being tracked by git.

In order to keep a single spark.gradle file, can we simply replace the spark.gradle paths in build.gradle to reference spark.gradle as ../gradle/spark.gradle ?

PS. I'm fairly new to the project. Pardon me if I'm missing something. 😅

py-ranoid avatar Mar 03 '19 18:03 py-ranoid

Hi @py-ranoid, thanks for looking into this issue. It makes sense, however if possible we should strive to use absolute paths built from project properties (to avoid dealing with relative path attacks with symlinks etc).

gerashegalov avatar Mar 04 '19 22:03 gerashegalov

How about keeping keeping only spark.gradle in the repository but copying it to helloworld/gradle/ and templates/simple/ during installation ?

py-ranoid avatar Mar 05 '19 02:03 py-ranoid

@tovbinm @gerashegalov Could you suggest a solution?

  1. Removing helloworld/gradle/spark.gradle and templates/simple/spark.gradle and referring to gradle/spark.gradle using relative paths
  2. Keeping only spark.gradle but copying it to helloworld/gradle/ and templates/simple/ during installation

py-ranoid avatar Mar 07 '19 07:03 py-ranoid

since helloworld is a source-controlled directory rather than installed than 1 seems better, (and I think you should be able to construct an absolute path.

gerashegalov avatar Mar 07 '19 07:03 gerashegalov

@gerashegalov In that case, can I replace apply from: 'gradle/spark.gradle with apply from: "${rootProject.projectDir}/../gradle/spark.gradle" in helloworld/build.gradle ? Would this still be vulnerable to a relative path attack ?

Also, I noticed that the following are duplicates too.

  1. helloworld/gradle/scalastyle-config.xml and gradle/scalastyle-config.xml.
  2. helloworld/gradle/wrapper/* and gradle/wrapper/*

Would you suggest factoring these out as well ?

py-ranoid avatar Mar 09 '19 16:03 py-ranoid

@gerashegalov @tovbinm Thoughts?

py-ranoid avatar Mar 23 '19 17:03 py-ranoid

Hi @py-ranoid I suggest you try it out and don't hesitate to submit a PR. We can discuss it more concretely on the PR. It does not have to be perfect, just something to iterate on. The preference is to avoid '..'

gerashegalov avatar Mar 26 '19 00:03 gerashegalov