python_mozetl
python_mozetl copied to clipboard
Add existing ETL jobs to CLI to schedule using `mozetl-submit`
Commands in mozetl can now take advantage of a single submission script to schedule jobs without the boilerplate. This makes testing the job on EMR and airflow much easier.
The following jobs can make use of bin/mozetl-submit.sh:
- [x] clients daily
- [x] churn
- [ ] firefox mau dau #141
- [ ] hardware report #140
- [x] search dashboard
- [x] search rollup
- [ ] shield privacy (?) #142
- [x] tab spinner #143
- [x] containers test pilot #144
- [ ] containers shielf #145
- [x] test pilot pulse #146
- [x] test pilot mau dau #147
- [ ] topline summary #148
- [ ] topline dashboard #149
Jobs can be registered with the command line using Click.
- Wrap the main entrypoint of the job with the
@click.commanddecorator- Use options (
@click.option,--my-option) to support values read from the environment. - Example: mozetl/maudau/maudau.py
- Use options (
- Register the main function with
mozetl.cli.entry_point - [optional but recommended] Test the job on ATMO using
mozetl-submit.sh
Additionally, if jobs are scheduled on airflow:
- Modify the airflow entry to use
mozetl_envvarto wrap environment variables. This converts the airflow environment variables to the appropriate mozetl ones- Example: telemetry-airflow/dags/churn.py
- Test by passing additional variables to the
mozetl_envvarfunction. (MOZETL_${COMMAND}_${OPTION}). SetMOZETL_GIT_PATHandMOZETL_GIT_BRANCHappropriately as pass-through variables.
- Request an airflow deploy
Review Checklist
mozetl
- Are all the arguments using
@click.option? - Are all required options using
required=True? - Is the command registered with the command line interface?
airflow
- Is
utils.mozetlimported into the scope? - Are the variables wrapped using
mozetl_envvar? - Does the name of the command match up with the
mozetl.cli? - Do the environment variable names match up with the option names
- Are all required options accounted for?