Explore module based refactor
@adrianna-chang-shopify and I paired on exploring the idea of a module based refactor to see if it would make it easier to provide a simple API to consumers for unsupported enumerator types. See discussion in #307.
This would be a breaking change, albeit one with a fairly straightforward migration path:
class ProductTask < Task
+ include MaintenanceTasks::Adapters::ActiveRecord
def collection Product.all end
def process(product) puts("Processed #{product}") end
end
class NameTask < Task
+ include MaintenanceTasks::Adapters::Array
def collection %w[Alice Bob Charlotte] end
def process(name) puts("Processed #{name}") end
end
class DBTask < Task
- include MaintenanceTasks::CsvTask
+ include MaintenanceTasks::Adapters::CSV
def process(row) puts("Processed #{row}") end
end
(In fact, if we just reused the existing CSV module, that change wouldn't be breaking)
The tradeoff this makes is that the consumer needs to choose the appropriate module to include, instead of us inferring the collection type.
If this approach makes sense, it can be fleshed out.
We should not use inheritance when what we want is to change strategy. Use the strategy pattern please.
Interesting point @rafaelfranca. One difficulty I see is that the interface for each enumeration strategy isn't always the same:
- The
ActiveRecord::RelationandArraystrategies want acollection - The
CSVstrategy wants acsv_content - The "custom enumerator" strategy is open ended
- Consumers might define an enumerator which has everything it needs
- Consumers might define an enumerator which expects some input (e.g. which resource to query from some API)
Is the strategy pattern still a good choice if each strategy has different requirements?
It seems to me like this would lead to two possible implementations:
Creating the strategy with all context
def strategy
ActiveRecordStrategy.new(Product.all)
end
Having the strategy delegate to the task for context
def strategy
ActiveRecordStrategy.new(self)
end
def collection
Product.all
end
Is one of those what you had in mind? To me, they both seem less simple for consumers than the mixin approach.
Also, the system can know which strategy needs to be used in the system, why are we asking users to specify it?
To automatically select the strategy, we need the collection (so we can check its type), which leads the the awkwardness discussed in #307, where we either need to return a special collection, or override the method that calls collection. Did you have another alternative in mind?
Based on @rafaelfranca 's feedback here and the followup on Slack about changing CsvTask to be an abstract class instead of a mixin, I'm thinking we might want to revisit the solution you suggested with inheritance @sambostock .
EnumeratorTask would be the top-level class in the hierarchy, with CollectionTask and CsvTask both inheriting from it and supplying their own behaviour. We'll reintroduce the concept of abstract class (see our original implementation with abstract_class) to deal with the DescendantsTracker and only show concrete Tasks in the UI.
This would involve a change to our API (Task -> CollectionTask for existing Tasks), unless we alias Task to CollectionTask as Sam mentioned - but tweaking our API a bit this early is not a huge concern to me. What do you think @rafaelfranca ?
I think I'm leaning towards exposing our own enumerator_builder method but without exposing the job-iteration builder or the cursor.
The enumerator builder should be an object that follows some API and that return an enumerator when asked to based on the cursor that it receives as argument.
Something like:
class MyEnumeratorBuilder
def initialize(collection)
@collection = collecion
end
def enumerator(cursor:)
# do what needs to be done and return an object that responds to `each` and uses the cursor
end
end
Then in the task:
class Task
def enumerator_builder
MyEnumeratorBuilder.new(collection)
end
end
In the Task superclass:
class TaskSuperclass
def enumerator_builder
collection = self.collection
case collection
when ActiveRecord::Relation
SomethingThatWillKnowHowToReturnTheActiveRecordEnumerator.new(collection)
when Array
SomethingThatWillKnowHowToReturnTheArrayEnumerator.new(collection)
when CSV
CsvEnumeratorBuilder.new(collection)
else
raise ArgumentError, "#{self.class.name}#collection must be either "\
'an Active Record Relation, an Array, or a CSV.'
end
end
It is similar to what you got in #307, but instead of passing any context to the task about the job-iteration enumerator builder and the curse we have our own API for enumerator builders.
@rafaelfranca your idea is indeed pretty similar to #307, though I had chosen .call so that simple cases could use a proc or lambda. It's not much work for consumers to define their own class though.
Do you foresee a case in which we would have to pass enumerator more than just the cursor? The only one I can think of is passing it JobIteration's enumerator_builder (which we can get around for everything except ThrottleEnumerator by building a new one with nil job), but the reason I ask is that it would be tricky to add an additional keyword argument down the road without breaking backwards compatibility.
However, if we passed in an object which responds to .cursor, then we could easily add more to it in the future while maintaining backwards compatibility. Basically the EnumeratorContext from #307:
EnumeratorContext = Struct.new(:cursor, keyword_init: true)
I also have strong opinions that we should not pass the Job Iteration's enumerator_builder. I see no reason for that object to exist. If we have a well-defined API for enumerators, we don't need to have a special object that know how to build that API.
The other argument that we may need in the future is the job so we can build the enumerators. I think passing the context object can make easy to add that argument later when we need to.
Lots of moving pieces across the refactors. 😅 Thanks for catching the various things I missed @adrianna-chang-shopify!
@rafaelfranca can you let us know what your thoughts are?
My thoughts are still the same. We should not use inheritance to change behavior by default. We can already infer the behavior based on the collection. Why we do need to ask people to tell us something that we can know already?
I'm ok with people that want custom enumerators using polymorphism to change the enumerator in their tasks, but the cases we support in this gem should not use any different superclass or include any new module (which is a different form of inheritance).
What I'm saying that we should have only one "abstract" class (possibly two with CSVTask but I still think we can remove that class) and only concrete classes in the applications.
Superseded by https://github.com/Shopify/maintenance_tasks/pull/944, so I am going to close this for now.