checklistbank icon indicating copy to clipboard operation
checklistbank copied to clipboard

Migrate the checklistbank index builder to Spark 3 and K8

Open fmendezh opened this issue 1 year ago • 1 comments

Currently, an Oozie workflow is used to build an Elasticsearch index from scratch. The workflow has two main tasks:

  1. AvroExporterApp.java:This task reads from the NameUsage API to export the data into Avro records that can later be easily imported into Elasticsearch.
  2. EsBackfill: This task reads the exported Avro records and creates a new Elasticsearch index. It also handles alias and index swapping.

This process needs to be migrated to Apache Airflow and Spark 3.5.1.

fmendezh avatar Oct 24 '24 08:10 fmendezh

Id would be good to keep the alias swapping and index build separate, we never run it as one job.

mdoering avatar Oct 24 '24 08:10 mdoering