📦[Feature] Support for async job offloading
Description
closes #320
This adds support for async jobs. What this does is allows you to submit an audio for processing and receive a job_id which you can check back for later.
- Job data is stored in a sqlite DB
- Transcription results ARE NOT stored in sqlite DB, rather in a set of jobs
- Processing in the background in order using a queue
- Data is temporarily stored for a period of time
Jobs are cleaned up on two occasions: https://github.com/syntaxsdev/whisper-asr-webservice/blob/3678d2aff95aff673fd4496e5aa8c2b11c1a6ae6/app/config.py#L56-L59
# How long to keep a batch process after its value been read - Default is 30 minutes
JOB_CLEANUP_AFTER_READ = int(os.getenv("JOB_CLEANUP_AFTER_READ", 1800))
# How long to keep a batch process after its value been abandoned (not read) - Default is 24 hours
JOB_CLEANUP_ABANDONED = int(os.getenv("JOB_CLEANUP_ABANDONED", 86400))
This means, once a job is processed, you have
- 24 hours (default) to read the value or it will be considered abandoned and deleted
- 30 minutes (default) after you read it before it is deleted
Usage
POST asr/ - just set the async_job param to true
Example response:
to retrieve:
GET asr/{job_id} (new endpoint)
Example response:
In this particular example, I used diarization on WhisperX model as well.
If a failure occurs, it will display the status as failed and also be cleaned at the JOB_CLEANUP_AFTER_READ period.
Other Notes
I've built it with support eventually to expand to async batch jobs, where you can upload multiple files at once (or multiple files into a job) and then eventually kick off the job, which is why the output is structured as such.
I think a separate PR would be warranted for that.
Testing
- Tested both locally and containerized (CPU/GPU).
- Verified works in Kubernetes (OpenShift)
- Works with 921MB (nearly 1GB) audio file, tested on GPU - took 2 minutes
Test containers:
docker.io/syntaxsdev/whisper-asr-webservice:latest
docker.io/syntaxsdev/whisper-asr-webservice:latest-gpu
bump @ahmetoner :)