python_mozetl icon indicating copy to clipboard operation
python_mozetl copied to clipboard

Add existing ETL jobs to CLI to schedule using `mozetl-submit`

Open acmiyaguchi opened this issue 8 years ago • 1 comments

Commands in mozetl can now take advantage of a single submission script to schedule jobs without the boilerplate. This makes testing the job on EMR and airflow much easier.

The following jobs can make use of bin/mozetl-submit.sh:

  • [x] clients daily
  • [x] churn
  • [ ] firefox mau dau #141
  • [ ] hardware report #140
  • [x] search dashboard
  • [x] search rollup
  • [ ] shield privacy (?) #142
  • [x] tab spinner #143
  • [x] containers test pilot #144
  • [ ] containers shielf #145
  • [x] test pilot pulse #146
  • [x] test pilot mau dau #147
  • [ ] topline summary #148
  • [ ] topline dashboard #149

Jobs can be registered with the command line using Click.

  • Wrap the main entrypoint of the job with the @click.command decorator
    • Use options (@click.option, --my-option) to support values read from the environment.
    • Example: mozetl/maudau/maudau.py
  • Register the main function with mozetl.cli.entry_point
  • [optional but recommended] Test the job on ATMO using mozetl-submit.sh

Additionally, if jobs are scheduled on airflow:

  • Modify the airflow entry to use mozetl_envvar to wrap environment variables. This converts the airflow environment variables to the appropriate mozetl ones
  • Request an airflow deploy

acmiyaguchi avatar Aug 04 '17 09:08 acmiyaguchi

Review Checklist

mozetl

  • Are all the arguments using @click.option?
  • Are all required options using required=True?
  • Is the command registered with the command line interface?

airflow

  • Is utils.mozetl imported into the scope?
  • Are the variables wrapped using mozetl_envvar?
  • Does the name of the command match up with the mozetl.cli?
  • Do the environment variable names match up with the option names
  • Are all required options accounted for?

acmiyaguchi avatar Aug 22 '17 22:08 acmiyaguchi