spline-spark-agent
spline-spark-agent copied to clipboard
Cant track org.apache.hadoop.fs.rename
Hi Folks,
Anyone has trouble with problems since the Spark Job includes many write and file rename operators (org.apache.hadoop.fs.rename
). This situation made the lineage correct. Please help me if you have faced this.
Context:
- Spark 3.0.1
- Scala 2.12
- Hadoop 3.1.0
My case:
write('hdfs://abc/tmp/123');
write('hdfs://xyz/tmp/123');
write('hdfs://asd/tmp/123');
rename('hdfs://abc/tmp/123','hdfs://abc/123');
rename('hdfs://xyz/tmp/123','hdfs://xyz/123');
rename('hdfs://asd/tmp/123','hdfs://asd/123');
My current approach is to implement a mapping job by using Hadoop audit logs(contains org.apache.hadoop.fs.rename``) to correct Spline's
write/read operators`