abhinav
abhinav
And any status updates? I'd be interested to test drive a quasi-mapping-based fusion caller!
Thanks for the tips; I'll experiment.
@gerashegalov True -- `FileInputFormat` does have to do a round of RPCs to `getFileBlockLocations` if `file` is not an instance of `LocatedFileStatus`. I experimented with increasing `mapreduce.input.fileinputformat.list-status.num-threads` to 20 to...
Oh, I guess other FileSystems are affected if the user has set `mapreduce.input.fileinputformat.list-status.num-threads`....
Did you notice this issue too, pkallos?
How are you solving it? The issue seems dead, but it's real, and I'm happy to code something different if someone proposes a better strategy.
I've tested a few times on a few inputs, and it's listed all files quickly. Add a comment if you find an issue.
Thanks for the bug report! So the error output is exactly `The file isofrags.tar.gz does not exist and thus cannot be cached.`? Sounds like a race condition. Still somewhat mysterious...
in EMR mode. can make this easy; investigate querying AWS with job flow ID
thanks ben -- working on it!