hraven icon indicating copy to clipboard operation
hraven copied to clipboard

glob list instead of simple hdfs list and pattern support for input

Open angadsingh opened this issue 11 years ago • 2 comments

Right now hraven accepts a simple hdfs path as input folder and will fetch all job history + conf files underneath it. This pull request adds support for specifying a pattern with wildcards (*) and using hdfs api's globStatus method to list files instead of hraven's recursive listFiles method. This way one can easily shard hraven's job to different years/months/days.

angadsingh avatar Jun 29 '14 11:06 angadsingh

Interesting idea. Doesn't the RM already do this (sharding history files by date) ? For Hadoop 1 we had the original directory all in one place (where the history server can read from), then we separately ran JobFilePartitioner to shard the files into a yyyy/mm/dd directory structure. Are you doing a different setup ? Can you explain how your history files appear in one place and then get shared, or how that works for you ?

jrottinghuis avatar Jul 01 '14 03:07 jrottinghuis

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Angad Singh seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Jul 18 '19 15:07 CLAassistant