PheKnowLator icon indicating copy to clipboard operation
PheKnowLator copied to clipboard

Setting Up and End-to-End CI/CD Framework

Open callahantiff opened this issue 3 years ago • 2 comments

Task

Task Type: INFRASTRUCTURE Determine which tools we will use in order to set-up an end-to-end CI/CD framework.

TODO

The requirements for this system include:

  • Leveraging GitHub Actions to:
    • Test the codebase
    • Downloaded needed resources and build the Docker Container
    • Deploy and run the Docker container via Google Cloud Run (one for each KG build type)
    • Generate baseline embeddings (#71)
    • Returning all results
    • Pushing certain files to Neo4J instance and SPARQL Endpoint

Potential Configurations:

  • CI/CD with Serverless Containers on GCP - Described here
  • Consider using Google Cloud Composer to kick-off the first task of the monthly build process which downloads and preprocess the data used for each build (LOD and Ontology data)

Proposed Tasks for CI/CD

  • Download all LOD and Ontology data
  • Preprocess and Clean data
  • KG Build

Related GitHub Issues: #47, #49

callahantiff avatar Dec 21 '20 21:12 callahantiff

TODO

  • [x] Script out data download and write to GCS (TASK 1)
    • If any failure in download, default to last build's version of downloaded data and log issue
  • [x] Script out preprocessing of LOD and Ontology data (TASK 2)
    • Log any issues
    • Output updated resource_info.txt, edge_source_list.txt, and ontology_source_list.txt to Docker container
    • Decide if ontologies are merged in TASK 2 and then merged data is also sent to resources/knowledge_graphs in Docker container
  • [x] Update Docker build trigger to pull from TASKS 1-2 (TASK 3)
  • [x] Add script that runs after each successful build and copies data from release_v2.0.0/archived_builds/build_DDMMYYY to release_v2.0.0/current_build/
  • [ ] Update SPARQL Endpoint and Neo4J

callahantiff avatar Dec 26 '20 21:12 callahantiff

Nearly done. The jobs are too long to use GitHub-hosted runners for GitHub Actions. Need to explore other options for self-hosted runner. Considering using Terraform.

callahantiff avatar Jan 31 '21 22:01 callahantiff