elephant-bird
elephant-bird copied to clipboard
uses FileSystem's listStatus rather than FileInputFormat's in LzoInputFormat
Proposed resolution to #426, which describes how FileInputFormat's listStatus is slow on S3 for input paths spanning many files.
@rangadi , @isnotinvain this seems like it's been around a while, discussed on its issue a bit. Any thoughts? should it get merged?
@gerashegalov commend on #426 that this can break compatibility since does not handle globs.
Also, I am not sure why EB traverse the directory tree again, since super.listStatus is supposed to have done that already.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.