Vrushali Channapattan

Results 9 issues of Vrushali Channapattan

As per https://issues.apache.org/jira/browse/MAPREDUCE-5464 need to update hRaven cost calculations to use the hadoop2 counter of megabytemillis instead of calculating megabytemillis on our own

An optimization that can be done in the Preprocessor: check timestamp of directory before listing all files in that directory. This will avoid putting unnecessary load on the NN

Create per day and per week aggregations in hRaven Can do the aggregation at the Processing step itself

Currently hRaven supports job, flow and app summary level rest apis. Will be good to add task level rest apis to hRaven

Presently hRaven includes job level statistics. It collects run time data and statistics from map reduce jobs running on Hadoop clusters and stores the collected job history in an easily...

The work that has gone into refactoring the code and enhancing hraven has been towards ensuring hraven can run on any hadoop version and process any version of job history...

there are some enums in the JobHistoryFileParserHadoop2 class. will be good to refactor them out into individual classes

As noted in #34 , we need a long term fix for dealing with counter subgroup names. Presently hRaven stores the subgroup name for each counter in the column name....

Specifically if the toString can be optimized away at https://github.com/twitter/hraven/blob/master/hraven-etl/src/main/java/com/twitter/hraven/etl/JobHistoryFileParserHadoop2.java#L228