Factor out all common spark/hadoop properties
Problem We currently largely overlapping spark.gradle files especially in terms of spark properties.
$ git ls-files | grep spark.gradle
gradle/spark.gradle
helloworld/gradle/spark.gradle
templates/simple/spark.gradle
Solution
Provide a way to have a single spark.gradle or at least a single spark-transmogrifai.conf file with common properties that is passed via --properties-file to Spark.
Alternatives
- common properties file
- refactored spark.gradle
Additional context DRY
Hey @gerashegalov
I'm guessing this issue is still concern since helloworld/gradle/spark.gradle and templates/simple/spark.gradle are duplicates of gradle/spark.gradle but are still being tracked by git.
In order to keep a single spark.gradle file, can we simply replace the spark.gradle paths in build.gradle to reference spark.gradle as ../gradle/spark.gradle ?
PS. I'm fairly new to the project. Pardon me if I'm missing something. 😅
Hi @py-ranoid, thanks for looking into this issue. It makes sense, however if possible we should strive to use absolute paths built from project properties (to avoid dealing with relative path attacks with symlinks etc).
How about keeping keeping only spark.gradle in the repository but copying it to helloworld/gradle/ and templates/simple/ during installation ?
@tovbinm @gerashegalov Could you suggest a solution?
- Removing
helloworld/gradle/spark.gradleandtemplates/simple/spark.gradleand referring togradle/spark.gradleusing relative paths - Keeping only
spark.gradlebut copying it tohelloworld/gradle/andtemplates/simple/during installation
since helloworld is a source-controlled directory rather than installed than 1 seems better, (and I think you should be able to construct an absolute path.
@gerashegalov In that case, can I replace
apply from: 'gradle/spark.gradle
with
apply from: "${rootProject.projectDir}/../gradle/spark.gradle"
in helloworld/build.gradle ?
Would this still be vulnerable to a relative path attack ?
Also, I noticed that the following are duplicates too.
-
helloworld/gradle/scalastyle-config.xmlandgradle/scalastyle-config.xml. -
helloworld/gradle/wrapper/*andgradle/wrapper/*
Would you suggest factoring these out as well ?
@gerashegalov @tovbinm Thoughts?
Hi @py-ranoid I suggest you try it out and don't hesitate to submit a PR. We can discuss it more concretely on the PR. It does not have to be perfect, just something to iterate on. The preference is to avoid '..'