dfs-datastores
dfs-datastores copied to clipboard
ClassFiles for Consolidator
Hello,
I am new to using this dfs-datastores library. I am attempting to use Consolidator to aggregate a lot of small files on hadoop into larger files. I had a noob question. Consolidator is modelled as an API. However, the jar which contains the classes which implement the PathLister and RecordStreamFactory will also need to be made available within the MR job by a different mechanism (DistributedCache or otherwise). Is that the right way of using Consolidator? We would need a seperate deploy step which would deploy this Jar (contained PathLister and RecordStreamFactory) seperately (since consolidator does not have a hook for adding it to the DistributedCache)?
Thanks, ~Rahul