Extract common classes from src/scala/microsoft-spark-<version>.
We create multiple jars during our builds to accommodate multiple versions of Apache Spark. In the current approach, the implementation is copied from one version to another and then necessary changes are made.
An ideal approach could create a common directory and extract common classes from duplicate code. Note that even if class/code is exactly the same, you cannot pull out to a common class if it depends on Apache Spark.
Success Criteria:
- PR that refactors all the classes appropriately
- Documentation for all the classes changed/added
- Documentation on upgrading versions (if it doesn't already exist)
Hi @imback82 I'm happy to be a volunteer for working on that ticket.
That will be great, thanks @spzSource!
Hi @imback82
Before start work I just want to confirm if I correctly understand suggested approach.
Am I correct saying that the intention is to create separate maven project (for instance, microsoft-spark-common), which should compile into separate .jar file?
Yea, I think that's one way to do it. But we have to make sure microsoft-spark is a fat JAR so that we don't break the existing customers' pipeline. Or we can put the common files into a common folder that different projects can build (not sure if this is doable). Does this make sense?
Hi @imback82
Looks like I stuck right after creating common maven module. Almost all classes inherit from org.apache.spark.util.Utils.Logging which doesn't allow to move such classes into common module due to dependence on specific spark version.
Am I correctly understand that removing Logging inheritance is not the option? In any way I'm happy for any ideas how to mitigate the problem.