checklistbank
checklistbank copied to clipboard
Migrate the checklistbank index builder to Spark 3 and K8
Currently, an Oozie workflow is used to build an Elasticsearch index from scratch. The workflow has two main tasks:
- AvroExporterApp.java:This task reads from the NameUsage API to export the data into Avro records that can later be easily imported into Elasticsearch.
- EsBackfill: This task reads the exported Avro records and creates a new Elasticsearch index. It also handles alias and index swapping.
This process needs to be migrated to Apache Airflow and Spark 3.5.1.
Id would be good to keep the alias swapping and index build separate, we never run it as one job.