data.gov icon indicating copy to clipboard operation
data.gov copied to clipboard

Create Harvest Runner Load Manager Script in Flask

Open btylerburton opened this issue 10 months ago • 1 comments

User Story

In order to kickoff jobs in the correct order and to balance the load on the runner, datagovteam wants to create a script that runs on Flask Admin App startup and whenever a job is complete.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • [ ] GIVEN the LoadManager script has been invoked, either by container startup or the completion of a previous job which has invoked the "harvest_job/complete" route. THEN it will check the env var CF_INSTANCE_INDEX to confirm it is "0", the primary instance.

  • [ ] GIVEN we are in the primary instance, the LoadManager will check the harvest_jobs table for any jobs marked "pending" or "pending_manual" AND it will check for load in the runner by running a cf tasks command and counting the number of currently running jobs

  • [ ] GIVEN the LoadManager has found jobs in the required states above AND the number of running jobs is less than the previously agreed upon count of jobs (3 in this case) THEN it will take the next job in "pending_manual" or "pending", prioritizing "pending_manual", and invoke a new CF task with the correct harvest_source id AND it will mark that job as "RUNNING" in the DB.

Background

[Any helpful contextual notes or links to artifacts/evidence, if needed]

We have decided to allow the above LoadManager script to stand in for a formal queue system because we don't have a need for the robustness and distributed capabilities of a first-class queue.

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

None

Sketch

  • [ ] Create LoadManager script
  • [ ] Configure it to run on container startup
  • [ ] Configure it to run when a harvest_job/complete route is invoked
  • [ ] LoadManager should check that it is running on the primary instance by checking the CF_INSTANCE_INDEX environment variable (or have a way to note running in "dev" mode, so it can run locally)
  • [ ] It should check for currently running tasks to calculate runner load / availability
  • [ ] It should check for tasks marked "pending" or "pending_manual" and, if present, and all the other conditions are satisfied, it should kick off a new task and then mark that job as "RUNNING" in the DB.

btylerburton avatar Apr 22 '24 20:04 btylerburton

We want to be able to get a cf task by name not just id. relevant code

rshewitt avatar Apr 24 '24 18:04 rshewitt