mrjob
mrjob copied to clipboard
don't clobber files in Spark tasks' working directories
It looks like Spark on YARN puts the following files/directories into a Spark container's working directory on EMR AMI 5.16.10 (which runs Spark 2.3.1 on Hadoop 2.8.4):
.container_tokens.crc
.default_container_executor.sh.crc
.default_container_executor_session.sh.crc
.launch_container.sh.crc
__spark_conf__
__spark_libs__
container_tokens
default_container_executor.sh
default_container_executor_session.sh
launch_container.sh
py4j-0.10.7-src.zip
pyspark.zip
tmp
We should make sure that the working directory manager doesn't allow clobbering these files.
This is a pretty low priority; just adding this ticket now while I have the list of files.