phoenix
phoenix copied to clipboard
PHOENIX-6721 CSV bulkload tool fails with FileNotFoundException if --…
…output points to the S3 location
The problem is that in our code we explicitly set the output committer to FileOutputCommitter which doesn't work well with the S3. HBase had a similar problem covered by HBASE-18885. I've tried to duplicate the approach used there, but it comes that S3 committer doesn't extend FileOuputCommitter, so I have to use the base class for both of those - PathOutputCommitter and use getOutputPath instead of getWorkPath. It works in my manual tests with and without AWS use.
@ss77892 - are there valid use cases for using an outputcommitter that's not a PathOutputCommitter? If not, is there somewhere we can put some validation in, rather than getting a ClassCastException?
actually, FileOutputCommitter.getOutputPath should work if all you want is to know the dest path of a job. you shouldn't use it for a real mr/spark job on s3a (performance, correctness) or gs (correctness as even v1 isn't resilient to failures in task commit), but to work out the final destination, it's good.
you got a stack of the FNFE?
now, what exactly are you doing with the output committer? just looking for a temporary directory or actually executing task attempts generating different blocks of data and then committing the output at the end?
Fixed version of this committed from #1765