telemetry-analysis-service icon indicating copy to clipboard operation
telemetry-analysis-service copied to clipboard

Refactor EMR scripts

Open jezdez opened this issue 8 years ago • 3 comments

Quoting from https://bugzilla.mozilla.org/show_bug.cgi?id=1312747:

Currently airflow and atmo are using two different EMR steps [1] [2] for almost the same logic. We should refactor those into a single script and add that directly to the telemetry-analysis-service repository so that we can have different steps in different environments, like staging and production. The bootstrap script [3] and the Spark configuration [4] should also be moved to telemetry-analysis-service. [1] https://github.com/mozilla/emr-bootstrap-spark/blob/master/ansible/files/batch.sh [2] https://github.com/mozilla/telemetry-airflow/blob/master/ansible/files/spark/airflow.sh [3] https://github.com/mozilla/emr-bootstrap-spark/blob/master/ansible/files/telemetry.sh [4] https://github.com/mozilla/emr-bootstrap-spark/blob/master/ansible/files/configuration.json

jezdez avatar Feb 13 '17 16:02 jezdez

@vitillo Regarding the emr-bootstrap-spark repository, can you clarify:

  • whether it should continue to exist or if I can merge its content completely into this repo?
  • should the Ansible playbook kept for testing purposes?

My gut feel is that this basically should live under a "deploy" folder in the root of this repo, including the Ansible playbook. Then update the ATMO and Airflow code to get the shell scripts from there (maybe even from GitHub's raw file serving URL?).

Since we have some ideas to make the ATMO code agnostic and remove the Mozilla bits (#199) is there anything you want me to keep an eye on?

jezdez avatar Feb 21 '17 11:02 jezdez

whether it should continue to exist or if I can merge its content completely into this repo?

I am generally OK with merging its content in this repo as long as it's easy to deploy changes to our Spark environments independently from the web-service if the need arises.

should the Ansible playbook kept for testing purposes?

To create or update an environment a parametrized cloudformation script has to be run. Ansible provides a convenient way to do so, among other things. Do you have something else in mind?

Ideally it should be easy to deploy a new test environment using a script (e.g. playbook) manually, so that Operations & Engineering can experiment with different configuration options and what not.

vitillo avatar Feb 21 '17 14:02 vitillo

This is really more of a meta-bug, which should ultimately be represented by a handful of other issues breaking this down into multiple pieces. Component issues will be opened soon and this issue will be updated to refer to them as that happens.

rafrombrc avatar Apr 26 '17 21:04 rafrombrc