Vrushali Channapattan issues

Results 9 issues of


                                            Vrushali Channapattan

Calculate job cost based on hadoop2 counters of megabyte millis

As per https://issues.apache.org/jira/browse/MAPREDUCE-5464 need to update hRaven cost calculations to use the hadoop2 counter of megabytemillis instead of calculating megabytemillis on our own

In Preprocessor, check timestamp of directory before listing all files in that dir

An optimization that can be done in the Preprocessor: check timestamp of directory before listing all files in that directory. This will avoid putting unnecessary load on the NN

Aggregate app per day and per week during hRaven Proccessing of each job

Create per day and per week aggregations in hRaven Can do the aggregation at the Processing step itself

Add task level REST apis to hRaven

Currently hRaven supports job, flow and app summary level rest apis. Will be good to add task level rest apis to hRaven

Extend hRaven to include hdfs usage

Presently hRaven includes job level statistics. It collects run time data and statistics from map reduce jobs running on Hadoop clusters and stores the collected job history in an easily...

Ensure hadoop 1.0 job history files can be processed on 2.0 cluster

The work that has gone into refactoring the code and enhancing hraven has been towards ensuring hraven can run on any hadoop version and process any version of job history...

refactor enums in JobHistoryFileParserHadoop2 into separate classes

there are some enums in the JobHistoryFileParserHadoop2 class. will be good to refactor them out into individual classes

Shorten counter sub group names - long term fix

As noted in #34 , we need a long term fix for dealing with counter subgroup names. Presently hRaven stores the subgroup name for each counter in the column name....

Investigate if some steps in the json parsing can be optimized

Specifically if the toString can be optimized away at https://github.com/twitter/hraven/blob/master/hraven-etl/src/main/java/com/twitter/hraven/etl/JobHistoryFileParserHadoop2.java#L228