anonlink-entity-service
anonlink-entity-service copied to clipboard
Run Scheduler
When a run is created a task is added to the celery queue to chunk the similarity scoring work into multiple tasks - which are added to the same celery queue.
There is no scheduling, so a tiny job that will take <1s can be queued for hours behind a large one. A different extreme is across multiple large jobs there might be so many chunks that they don't fit in the celery queue.
One idea is to introduce a new coarse layer of chunking and apply a (fair) scheduler so all active runs get a chance to run. Interested to expand on that, or consider other approaches.