azkaban-plugins
azkaban-plugins copied to clipboard
Pig Job Auto Tuning Integration
This change is for integrating Pig job with auto tuning framework (https://github.com/linkedin/dr-elephant/wiki/Auto-Tuning). Auto Tuning framework provides a way to automatically tune hadoop jobs. Currently it supports Pig Job for resource usage optimization. Integration is pretty easy. It requires calling getCurrentRunParameters API before running any job. This API responds with a parameter suggestion. These parameters should be used for the configuration of the job.
Design
-
Currently configurations are injected using HadoopConfigurationInjector. We have written a new HadoopTuningConfigurationInjector. This new injector will inject default configuration first and then call getCurrentRunParameter API to get current run of parameters from AutoTuning framework and inject those parameters in configuration of the job.
-
Failure Handling -- API down: In case API is down, fallback method would be to let job use default configuration. -- Other failed reason: In case there is other failures, we will identify if this is because of auto tuning by analyzing log and then retry getCurrentRunParameter API with isRetry flag enabled. In this case getCurrentRunParameter returns best parameters from already tried parameters.
-
We will create a new wrapper HadoopTuneInSecurePigWrapper, similar to HadoopSecurePigWrapper which will have this new flow of injecting parameter, handling failures and calling status API. And based on flag auto_tuning_enabled (default false), we can choose which wrapper to use. (HadoopPigJob) Another approach could be to use auto_tuning_enabled flag to decide on flow and don’t create HadoopTuneInSecurePigWrapper.
-
We will avoid doing configuration override in the script and will ask users to not keep any configuration inside the script for enabling AutoTuning.
Manual Test Cases AzkabanPigJobTypeTestCases.xlsx
Have taken care of all of the reviews except writing test using mock API call. This will take some time. Need to explore a bit.
@xkrogen and @inramana Thanks for the reviews.
In case you are not aware yet, I wrote a tip for how to update copyright automatically. see https://github.com/azkaban/azkaban/wiki/Developer-Tools-and-Tips#use-intellij-to-create-and-update-copyright-automatically
Just reduced verbosity: https://www.javaworld.com/article/2074080/core-java/jdk-7--the-diamond-operator.html
On Mon, May 21, 2018 at 1:42 PM mkumar1984 [email protected] wrote:
@mkumar1984 commented on this pull request.
In plugins/jobtype/src/azkaban/jobtype/tuning/TuningErrorHandler.java https://github.com/azkaban/azkaban-plugins/pull/290#discussion_r189707931 :
+import java.util.ArrayList; +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern;
+import org.apache.log4j.Logger; + + +/**
- This class is responsible for finding whether failure is because of tuning parameters.
- This try to search predefined patterns in the log.
- */ +public class TuningErrorHandler {
- private Logger log = Logger.getRootLogger();
- private static List<Pattern> errorPatterns = new ArrayList<Pattern>();
Done. BTW what's the difference between two.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/azkaban/azkaban-plugins/pull/290#discussion_r189707931, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRBoQZkybwMU-cuhH8pWMQA9loQRqx4ks5t0yaugaJpZM4T5_09 .
Added a test case for TestTuningParameterUtils.java to mock the API and test updateAutoTuningParameters. This should handle all the review comments. Let me know if there is any other comments.
LGTM, great work @mkumar1984 !
@mkumar1984 can you resolve conflicts? I can merge this once you are done.
I believe the code has been moved to azkaban main repo.