maintenance_tasks Explore module based refactor

@adrianna-chang-shopify and I paired on exploring the idea of a module based refactor to see if it would make it easier to provide a simple API to consumers for unsupported enumerator types. See discussion in #307.

This would be a breaking change, albeit one with a fairly straightforward migration path:

 class ProductTask < Task
+ include MaintenanceTasks::Adapters::ActiveRecord
   def collection Product.all end
   def process(product) puts("Processed #{product}") end
 end
 class NameTask < Task
+ include MaintenanceTasks::Adapters::Array
   def collection %w[Alice Bob Charlotte] end
   def process(name) puts("Processed #{name}") end
 end
 class DBTask < Task
- include MaintenanceTasks::CsvTask
+ include MaintenanceTasks::Adapters::CSV
   def process(row) puts("Processed #{row}") end
 end

(In fact, if we just reused the existing CSV module, that change wouldn't be breaking)

The tradeoff this makes is that the consumer needs to choose the appropriate module to include, instead of us inferring the collection type.

If this approach makes sense, it can be fleshed out.

Jan 22 '21 20:01 sambostock

We should not use inheritance when what we want is to change strategy. Use the strategy pattern please.

Interesting point @rafaelfranca. One difficulty I see is that the interface for each enumeration strategy isn't always the same:

The ActiveRecord::Relation and Array strategies want a collection
The CSV strategy wants a csv_content
The "custom enumerator" strategy is open ended
- Consumers might define an enumerator which has everything it needs
- Consumers might define an enumerator which expects some input (e.g. which resource to query from some API)

Is the strategy pattern still a good choice if each strategy has different requirements?

It seems to me like this would lead to two possible implementations:

Creating the strategy with all context

def strategy
  ActiveRecordStrategy.new(Product.all)
end

Having the strategy delegate to the task for context

def strategy
  ActiveRecordStrategy.new(self)
end

def collection
  Product.all
end

Is one of those what you had in mind? To me, they both seem less simple for consumers than the mixin approach.

Also, the system can know which strategy needs to be used in the system, why are we asking users to specify it?

To automatically select the strategy, we need the collection (so we can check its type), which leads the the awkwardness discussed in #307, where we either need to return a special collection, or override the method that calls collection. Did you have another alternative in mind?

Jan 22 '21 23:01 sambostock

Based on @rafaelfranca 's feedback here and the followup on Slack about changing CsvTask to be an abstract class instead of a mixin, I'm thinking we might want to revisit the solution you suggested with inheritance @sambostock .

EnumeratorTask would be the top-level class in the hierarchy, with CollectionTask and CsvTask both inheriting from it and supplying their own behaviour. We'll reintroduce the concept of abstract class (see our original implementation with abstract_class) to deal with the DescendantsTracker and only show concrete Tasks in the UI.

This would involve a change to our API (Task -> CollectionTask for existing Tasks), unless we alias Task to CollectionTask as Sam mentioned - but tweaking our API a bit this early is not a huge concern to me. What do you think @rafaelfranca ?

Jan 25 '21 14:01 adrianna-chang-shopify

I think I'm leaning towards exposing our own enumerator_builder method but without exposing the job-iteration builder or the cursor.

The enumerator builder should be an object that follows some API and that return an enumerator when asked to based on the cursor that it receives as argument.

Something like:

class MyEnumeratorBuilder
  def initialize(collection)
    @collection = collecion
  end

  def enumerator(cursor:)
    # do what needs to be done and return an object that responds to `each` and uses the cursor
  end
end

Then in the task:

class Task
  def enumerator_builder
    MyEnumeratorBuilder.new(collection)
  end
end

In the Task superclass:

class TaskSuperclass
  def enumerator_builder
      collection = self.collection

      case collection
      when ActiveRecord::Relation
        SomethingThatWillKnowHowToReturnTheActiveRecordEnumerator.new(collection)
      when Array
        SomethingThatWillKnowHowToReturnTheArrayEnumerator.new(collection)
      when CSV
        CsvEnumeratorBuilder.new(collection)
      else
        raise ArgumentError, "#{self.class.name}#collection must be either "\
          'an Active Record Relation, an Array, or a CSV.'
      end
  end

It is similar to what you got in #307, but instead of passing any context to the task about the job-iteration enumerator builder and the curse we have our own API for enumerator builders.

Jan 26 '21 00:01 rafaelfranca

@rafaelfranca your idea is indeed pretty similar to #307, though I had chosen .call so that simple cases could use a proc or lambda. It's not much work for consumers to define their own class though.

Do you foresee a case in which we would have to pass enumerator more than just the cursor? The only one I can think of is passing it JobIteration's enumerator_builder (which we can get around for everything except ThrottleEnumerator by building a new one with nil job), but the reason I ask is that it would be tricky to add an additional keyword argument down the road without breaking backwards compatibility.

However, if we passed in an object which responds to .cursor, then we could easily add more to it in the future while maintaining backwards compatibility. Basically the EnumeratorContext from #307:

    EnumeratorContext = Struct.new(:cursor, keyword_init: true)

Jan 26 '21 01:01 sambostock

I also have strong opinions that we should not pass the Job Iteration's enumerator_builder. I see no reason for that object to exist. If we have a well-defined API for enumerators, we don't need to have a special object that know how to build that API.

The other argument that we may need in the future is the job so we can build the enumerators. I think passing the context object can make easy to add that argument later when we need to.

Jan 26 '21 19:01 rafaelfranca

Lots of moving pieces across the refactors. 😅 Thanks for catching the various things I missed @adrianna-chang-shopify!

Jan 27 '21 17:01 sambostock

@rafaelfranca can you let us know what your thoughts are?

My thoughts are still the same. We should not use inheritance to change behavior by default. We can already infer the behavior based on the collection. Why we do need to ask people to tell us something that we can know already?

I'm ok with people that want custom enumerators using polymorphism to change the enumerator in their tasks, but the cases we support in this gem should not use any different superclass or include any new module (which is a different form of inheritance).

What I'm saying that we should have only one "abstract" class (possibly two with CSVTask but I still think we can remove that class) and only concrete classes in the applications.

Jan 27 '21 19:01 rafaelfranca

Superseded by https://github.com/Shopify/maintenance_tasks/pull/944, so I am going to close this for now.

Jan 19 '24 20:01 gmcgibbon