datahub icon indicating copy to clipboard operation
datahub copied to clipboard

fix(elasticsearch) Analytics indices creation on AWS ES

Open tomas-kubin opened this issue 2 years ago β€’ 1 comments

πŸ₯… Goal

Solve issue #5376 with analytics Elasticsearch indices being created incorrectly on AWS ES and the Analytics Datahub page then not working.

πŸ” Details

When running against AWS Elasticsearch (aka Amazon OpenSearch), analytics indices tend to have problems (see issue #5376 or search Slack for datahub_usage_event-000001). This PR introduces three changes in the create-indices.sh script:

  1. refactoring the script: It contained many copy-pasting and was not easy to follow or maintain. Adding comments, extracting repeatadly-used operations into functions, unifying approaches.
  2. adding index fix: When the script detects that the datahub_usage_event index was created incorrectly (probably by GMS when running with USE_AWS_ELASTICSEARCH incorrectly not set), it drops it and recreates it. This is should help many struggling developers.
  3. configuration hint: The script tries to detect whether the USE_AWS_ELASTICSEARCH should have been used after ES endpoint error and writes a hint about its usage.

πŸ§ͺ Testing

Building the modified elasticsearch-setup-job image and using it in my Datahub helm charts, then deploying using these charts.

My setup uses Amazon Opensearch. Didn't test with the other case.

Case 1: clean slate

  • Nuking everything
  • Deploy the helm charts
  • Result: indexes created successfully
elasticsearch-setup-job log
2022/07/28 17:12:40 Waiting for: https://xxx.es.amazonaws.com:443
2022/07/28 17:12:40 Received 200 from https://xxx.es.amazonaws.com:443

>>> creating _opendistro/_ism/policies/datahub_usage_event_policy ...
{
  "policy": {
    "policy_id": "datahub_usage_event_policy",
    "description": "Datahub Usage Event Policy",
    "default_state": "Rollover",
    "schema_version": 1,
    "states": [
      {
        "name": "Rollover",
        "actions": [
          {
            "rollover": {
              "min_index_age": "1d"
            }
          }
        ],
        "transitions": [
          {
            "state_name": "ReadOnly",
            "conditions": {
              "min_index_age": "7d"
            }
          }
        ]
      },
      {
        "name": "ReadOnly",
        "actions": [
          {
            "read_only": {}
          }
        ],
        "transitions": [
          {
            "state_name": "Delete",
            "conditions": {
              "min_index_age": "60d"
            }
          }
        ]
      },
      {
        "name": "Delete",
        "actions": [
          {
            "delete": {}
          }
        ],
        "transitions": []
      }
    ],
    "ism_template": {
      "index_patterns": [
        "datahub_usage_event-*"
      ],
      "priority": 100
    }
  }
}{"_id":"datahub_usage_event_policy","_version":1,"_primary_term":1,"_seq_no":0,"policy":{"policy":{"policy_id":"datahub_usage_event_policy","description":"Datahub Usage Event Policy","last_updated_time":1659028360937,"schema_version":1,"error_notification":null,"default_state":"Rollover","states":[{"name":"Rollover","actions":[{"rollover":{"min_index_age":"1d"}}],"transitions":[{"state_name":"ReadOnly","conditions":{"min_index_age":"7d"}}]},{"name":"ReadOnly","actions":[{"read_only":{}}],"transitions":[{"state_name":"Delete","conditions":{"min_index_age":"60d"}}]},{"name":"Delete","actions":[{"delete":{}}],"transitions":[]}],"ism_template":[{"index_patterns":["datahub_usage_event-*"],"priority":100,"last_updated_time":1659028360937}]}}}
>>> creating _template/datahub_usage_event_index_template ...
{
  "index_patterns": ["datahub_usage_event-*"],
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "type": {
        "type": "keyword"
      },
      "timestamp": {
        "type": "date"
      },
      "userAgent": {
        "type": "keyword"
      },
      "browserId": {
        "type": "keyword"
      }
    }
  },
  "settings": {
    "index.opendistro.index_state_management.rollover_alias": "datahub_usage_event"
  }
}{"acknowledged":true}
>>> creating datahub_usage_event-000001 ...
{
  "aliases": {
    "datahub_usage_event": {
      "is_write_index": true
    }
  }
}
2022/07/28 17:12:41 Command finished successfully.

Case 2: invalid index

  • Nuke everything
  • Deploy with USE_AWS_ELASTICSEARCH not set -> elasticsearch-setup-job fails (see log below)
  • Restart GMS
  • Result analytics not working; but there is a configuration hint in elasticsearch-setup-job logs
elasticsearch-setup-job log
2022/07/28 17:20:49 Waiting for: https://xxx.es.amazonaws.com:443
2022/07/28 17:20:49 Received 200 from https://xxx.es.amazonaws.com:443

>>> failed to GET _ilm/policy/datahub_usage_event_policy (401) !
... looks like AWS OpenSearch is used; please set USE_AWS_ELASTICSEARCH env value to true
2022/07/28 17:20:49 Command exited with error: exit status 1
  • Redeploy with correctly set USE_AWS_ELASTICSEARCH=true
  • Result: elasticsearch-setup-job runs successfully, analytics now working correctly
elasticsearch-setup-job log
2022/07/28 17:26:10 Received 200 from https://xxx.es.amazonaws.com:443

>>> creating _opendistro/_ism/policies/datahub_usage_event_policy ...
{
  "policy": {
    "policy_id": "datahub_usage_event_policy",
    "description": "Datahub Usage Event Policy",
    "default_state": "Rollover",
    "schema_version": 1,
    "states": [
      {
        "name": "Rollover",
        "actions": [
          {
            "rollover": {
              "min_index_age": "1d"
            }
          }
        ],
        "transitions": [
          {
            "state_name": "ReadOnly",
            "conditions": {
              "min_index_age": "7d"
            }
          }
        ]
      },
      {
        "name": "ReadOnly",
        "actions": [
          {
            "read_only": {}
          }
        ],
        "transitions": [
          {
            "state_name": "Delete",
            "conditions": {
              "min_index_age": "60d"
            }
          }
        ]
      },
      {
        "name": "Delete",
        "actions": [
          {
            "delete": {}
          }
        ],
        "transitions": []
      }
    ],
    "ism_template": {
      "index_patterns": [
        "datahub_usage_event-*"
      ],
      "priority": 100
    }
  }
}{"_id":"datahub_usage_event_policy","_version":1,"_primary_term":1,"_seq_no":0,"policy":{"policy":{"policy_id":"datahub_usage_event_policy","description":"Datahub Usage Event Policy","last_updated_time":1659029170348,"schema_version":1,"error_notification":null,"default_state":"Rollover","states":[{"name":"Rollover","actions":[{"rollover":{"min_index_age":"1d"}}],"transitions":[{"state_name":"ReadOnly","conditions":{"min_index_age":"7d"}}]},{"name":"ReadOnly","actions":[{"read_only":{}}],"transitions":[{"state_name":"Delete","conditions":{"min_index_age":"60d"}}]},{"name":"Delete","actions":[{"delete":{}}],"transitions":[]}],"ism_template":[{"index_patterns":["datahub_usage_event-*"],"priority":100,"last_updated_time":1659029170348}]}}}
>>> creating _template/datahub_usage_event_index_template ...
{
  "index_patterns": ["datahub_usage_event-*"],
  "mappings": {
    "properties": {
      "@timestamp": {
        "type": "date"
      },
      "type": {
        "type": "keyword"
      },
      "timestamp": {
        "type": "date"
      },
      "userAgent": {
        "type": "keyword"
      },
      "browserId": {
        "type": "keyword"
      }
    }
  },
  "settings": {
    "index.opendistro.index_state_management.rollover_alias": "datahub_usage_event"
  }
}{"acknowledged":true}
>>> deleting invalid datahub_usage_event ...
{"acknowledged":true}
>>> creating datahub_usage_event-000001 ...
{
  "aliases": {
    "datahub_usage_event": {
      "is_write_index": true
    }
  }
}
2022/07/28 17:26:11 Command finished successfully.

Case 3: no-change

  • Redeploy with some unrelated bogus change
  • Result: analytics still working
elasticsearch-setup-job log
2022/07/28 17:28:32 Waiting for: https://xxx.es.amazonaws.com:443
2022/07/28 17:28:32 Received 200 from https://xxx.es.amazonaws.com:443

>>> _opendistro/_ism/policies/datahub_usage_event_policy already exists βœ“

>>> _template/datahub_usage_event_index_template already exists βœ“

>>> datahub_usage_event-000001 already exists βœ“
2022/07/28 17:28:33 Command finished successfully.

β˜‘οΈ Checklist

  • [x] The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • [x] Links to related issues
  • [x] Tests for the changes have been added/updated (not applicable)
  • [x] Docs related to the changes have been added/updated (adding several comments into the script itself)
  • [x] For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub (no downtime expected)

tomas-kubin avatar Jul 27 '22 15:07 tomas-kubin

Unit Test Results (build & test)

584 tests  Β±0   580 :heavy_check_mark: Β±0   12m 48s :stopwatch: -7s 143 suites Β±0β€‚β€ƒβ€ƒβŸβ€„βŸβ€„4 :zzz: Β±0  143 files   Β±0β€‚β€ƒβ€ƒβŸβ€„βŸβ€„0 :x: Β±0 

Results for commit 08736725. ± Comparison against base commit 9e7bd1a8.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Jul 27 '22 17:07 github-actions[bot]