Spark Lineage zstd-jni conflict
Describe the bug
This occurred using spark 3.1.2 and datahub-spark-lineage 0.8.43.
Spark lineage is relocating zstd-jni classes. According to zstd-jni documentation, this should not be done since it can cause native library issues. This leads to intermittent errors depending on if .so files in datahub-spark-lineage-0.8.43.jar or zstd-jni-1.4.8-1.jar (included with spark 3.1) gets chosen by Java.
java.lang.UnsatisfiedLinkError: com.github.luben.zstd.Zstd.setCompressionLevel(JI)I
at com.github.luben.zstd.Zstd.setCompressionLevel(Native Method)
at com.github.luben.zstd.ZstdOutputStream.<init>(ZstdOutputStream.java:67)
at org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:223)
at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:910)
at org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:233)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72)
at org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:230)
at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:466)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io
Ditto'd in the community channel :https://datahubspace.slack.com/archives/CUMUWQU66/p1669017837146189