data.gov icon indicating copy to clipboard operation
data.gov copied to clipboard

Create Clear for Harvest2.0 Harvest source

Open jbrown-xentity opened this issue 8 months ago • 0 comments

User Story

In order to be able to reset/restart a harvest source, data.gov admins want a clear function/API route.

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • [ ] GIVEN the /harvest/{id}/clear API route is created
    WHEN I call the harvest API at /harvest/{id}/clear
    THEN the datasets are removed from CKAN
    AND the dataset records/errors/jobs are removed from the harvest DB

Background

Very helpful for testing, and occasionally useful for resetting a harvest source that has become corrupted or out of sync. Similar to CKAN clear functionality.

Security Considerations (required)

Should require authentication, but no security additions required.

Sketch

Eventually the CKAN removal piece may become so cumbersome (ie take so long, longer than the restart time [15 minutes?]) that we'll want to implement that piece as a subtask. For this instance, just utilize the API normally. Simply try to run a CKAN dataset purge. Also run the DB delete/clear commands. Ideally if everything is synced correctly, you should be able to remove the harvest jobs and let everything flow to delete the other foreign objects, but might require config changes or workarounds.

jbrown-xentity avatar Jun 10 '24 17:06 jbrown-xentity