spline-spark-agent icon indicating copy to clipboard operation
spline-spark-agent copied to clipboard

Cant track org.apache.hadoop.fs.rename

Open vinhnemo opened this issue 1 year ago • 0 comments

Hi Folks,

Anyone has trouble with problems since the Spark Job includes many write and file rename operators (org.apache.hadoop.fs.rename). This situation made the lineage correct. Please help me if you have faced this.

Context:

  • Spark 3.0.1
  • Scala 2.12
  • Hadoop 3.1.0

My case:

write('hdfs://abc/tmp/123');
write('hdfs://xyz/tmp/123');
write('hdfs://asd/tmp/123');
rename('hdfs://abc/tmp/123','hdfs://abc/123');
rename('hdfs://xyz/tmp/123','hdfs://xyz/123');
rename('hdfs://asd/tmp/123','hdfs://asd/123');

My current approach is to implement a mapping job by using Hadoop audit logs(contains org.apache.hadoop.fs.rename``) to correct Spline's write/read operators`

vinhnemo avatar Oct 27 '23 02:10 vinhnemo