maintenance_tasks
maintenance_tasks copied to clipboard
Provide support for enumerable collections beyond Array and ActiveRecord::Relation in TaskJob
Will be partially solved when CSV functionality is implemented, after which we will decide whether further collections should be supported.
👋 I was going to open a new issue, but this seemed like a more appropriate place to chime in.
TL;DR: Lack of support for custom enumerators prevents certain use cases, such as processing external resources.
I was investigating adopting this gem for our app (Shopify/athena), and support being limited to ActiveRecord::Relation & Array (and CSVs, once that is added) would block us from adopting it for many of our use cases.
Athena manages many resources in Twilio. One of the things we do is run a monthly job which iterates over certain resources which are over a month old and deletes them. For this, we use Shopify/job-iteration, and a custom enumerator which "streams" in all the results for our search, and iterates over them until they're all gone. In some other jobs which don't delete the resources, we use their start_time as an increasing cursor.
This works because JobIteration supports custom Enumerators. Given the current implementation, we would have to pre-load all the resources (or IDs) into an Array before starting to process them, or some other workaround.
It would be great if MaintenanceTasks supported custom enumerators, similarly to JobIteration!
As per the discussion in https://github.com/Shopify/maintenance_tasks/pull/307#discussion_r561420323, I'm going to expand on some of our use cases, with respect to processing API resources in Twilio.
Most of these tasks run on a schedule, though we have had one-off tasks come and go.
- Synchronization tasks: We have tasks which iterate through resources in Twilio, typically using a timestamp field (
created_at,updated_at,event_date, etc.) as a cursor, which we want to upsert into our SQL "mirror" database - Deletion tasks: We delete various resources as they exit our retention period. Some of these are "mirrored" in our database, so we enumerate over the records, deleting the resources and their record, one by one. For resources which are not mirrored, we must iterate via the API, again with some time based cursor, but this time with extra parameters to narrow the time range.
- One-off "backfill" tasks:
- After adding code to automatically move call recordings from Twilio into GCS once ready, we had to iterate through existing recordings and move them over.
- After adding some code to populate a worker attribute after worker creation, we had to backfill the attribute for all existing workers.
In these examples, we want to iterate over a set of resources
- not available as an
ActiveRecord::Relation - potentially large enough to want to avoid eagerly building an
Array - potentially expanding over the duration of the enumeration (i.e. as we process resources, more may be created)
- potentially large enough to warrant
cursorbased resumption after interruption - requiring a custom
cursor
This strikes me as a good use case for building a custom enumerator, based on:
MaintenanceTaskshaving no knowledge of the resource type, or how to query itMaintenanceTaskshaving no knowledge of how to construct a suitable cursor- eagerly dumping the entire
collectioninto anArraybeing expensive
@sambostock @rafaelfranca I see a few PRs open that would add the support for enumerators. I think https://github.com/Shopify/maintenance_tasks/pull/326 is especially close to what I would need, but I also notice they haven't seen activity in over 2 years.
Is there still an appetite to tackle this?