badgerdoc
badgerdoc copied to clipboard
BadgerDoc can execute pipelines by retrieving tasks and revisions from `finished` jobs
BadgerDoc should allow users to send file revisions back to the pipeline engine to enhance ML models following manual checks or annotations. After an annotation is committed, each file has revisions. We should be able to send the latest revision to the pipeline with the file. However, you don't need to select a file for revision; selecting finished
jobs will automatically retrieve tasks, files, and revisions.
Back-end
- We need to add or verify existing functionality to get a list of the latest revisions by task or by file.
- User validation must be added - users can create new jobs with
finished
jobs only. - When a new job is created, the back-end must check if a list of files, datasets or jobs is being passed.
- Add a new field
previous_jobs
to thejob
table. This field must beJSONB
and contains the IDs of the passed jobs, however, all other fields (tasks, files) should be filled as they are now. - BadgerDoc must send an event to Pipelines similar to the example below:
{
"files_data": [
{
"revision": "00afbbcd-9628-479b-89cb-25aa893f46f4",
"bucket": "local",
"input": {
"job_id": 47
},
"input_path": "files/344/344.pdf",
"output_path": null,
"pages": [
8,
9
],
"s3_signed_url": "http://badgerdoc-minio:9000/local/files/344/344.pdf?AWSAccessKeyId=minioadmin&Signature=TfJOWzctdD8UcPkg3EsQBvpU8go%3D&Expires=1715783449"
},
{
"revision": "cd13076f-9c10-4afc-bc8d-dbeca34ee857",
"bucket": "local",
"input": {
"job_id": 46
},
"input_path": "files/344/344.pdf",
"output_path": null,
"pages": [
8,
9
],
"s3_signed_url": "http://badgerdoc-minio:9000/local/files/344/344.pdf?AWSAccessKeyId=minioadmin&Signature=TfJOWzctdD8UcPkg3EsQBvpU8go%3D&Expires=1715783449"
}
],
"job_id": 48,
"tenant": "local"
}
The revision
field is filled in by the latest task revision.
Users can start both the Extraction and Extraction and Annotation jobs. However, for now, we won't implement different behavior for the Annotation part. In the future, we will use the passed revision as the base for the annotation.
Front-end
In the dataset selection screen, we need to add a tab to select jobs instead of files. Users can select files or jobs - we shouldn't allow both to be selected for one job.
All other scenarios remain the same. However, when creating a job, the form must send revisions instead of files.