phoenix PHOENIX-6721 CSV bulkload tool fails with FileNotFoundException if --…

PHOENIX-6721 CSV bulkload tool fails with FileNotFoundException if --…

Open ss77892 opened this issue 3 years ago • 3 comments

…output points to the S3 location

The problem is that in our code we explicitly set the output committer to FileOutputCommitter which doesn't work well with the S3. HBase had a similar problem covered by HBASE-18885. I've tried to duplicate the approach used there, but it comes that S3 committer doesn't extend FileOuputCommitter, so I have to use the base class for both of those - PathOutputCommitter and use getOutputPath instead of getWorkPath. It works in my manual tests with and without AWS use.

May 31 '22 20:05 ss77892

@ss77892 - are there valid use cases for using an outputcommitter that's not a PathOutputCommitter? If not, is there somewhere we can put some validation in, rather than getting a ClassCastException?

Jun 20 '22 18:06 gjacoby126

actually, FileOutputCommitter.getOutputPath should work if all you want is to know the dest path of a job. you shouldn't use it for a real mr/spark job on s3a (performance, correctness) or gs (correctness as even v1 isn't resilient to failures in task commit), but to work out the final destination, it's good.

Aug 05 '22 10:08 steveloughran

you got a stack of the FNFE?

Aug 05 '22 10:08 steveloughran

now, what exactly are you doing with the output committer? just looking for a temporary directory or actually executing task attempts generating different blocks of data and then committing the output at the end?

Aug 01 '23 09:08 steveloughran

Fixed version of this committed from #1765

Jan 02 '24 06:01 stoty

phoenix phoenix copied to clipboard

PHOENIX-6721 CSV bulkload tool fails with FileNotFoundException if --…

phoenix
phoenix copied to clipboard