codewatch icon indicating copy to clipboard operation
codewatch copied to clipboard

Support explicit ordering of visitor functions rather than purely relying on declaration order

Open noahnu opened this issue 6 years ago • 6 comments

Is your feature request related to a problem? Please describe.

Ensuring a strict ordering on the execution of "visitors" is necessary if we want to guarantee deterministic assertions. If visitor_a increments counter_a for some node of type X and visitor_b increments counter_b for node of type X only if counter_a is greater than 0, then changing the ordering that visitor_a and visitor_b are called will change the final value of the computed statistics, meaning assertions may differ from the run that used different ordering.

While typing out this issue I was trying to come up with a good use case for having one visitor function depend on another. I could not. Perhaps someone else can think of an example?

Even if we were to consider this an anti-pattern, we should attempt to minimize possibility of error / flakiness.

Changing to a reducer-style approach for visitors may encourage inter-visitor dependencies as per https://github.com/tophat/codewatch/issues/10.

Currently order of execution of visitors is defined by declaration order, purely b/c we use a central registry with a global array that is appended to whenever the @visit decoration is executed. The use of a global registry seems like an anti-pattern, however in order to remove this registry, we'd need some way to guarantee visitor order (dir(module) sorts alphabetically, while module.__dict__.keys() is only guaranteed to be ordered in py3.6+).

Describe the solution you'd like

A pattern similar to https://pytest-dependency.readthedocs.io/en/latest/about.html#what-is-the-purpose, where you essentially mark the names of tests that must be run first. We can thus use a topological sort of the dependencies.

e.g.

@visit(nodes.FunctionDef)
def some_func(node, .., ..):
  pass

@visit(nodes.FunctionDef, predicate=None, depends_on=['some_func', 'some_other_func'])
def count_funcs(node, .., ..):
  pass

We'd examine "depends_on" to generate a graph and then topological sort. We could sort alphabetically for visitors with no dependencies (or that are tied in sorting order).

Describe alternatives you've considered

  • Use inspect module to identify line numbers of items in dir(). Use this to order. Pro: consistent in all versions of python. Con: probably slow (?) and will grow in complexity when your config is spread across multiple files.
  • Rely on module.dict which is sorted by declaration order in py3.6+ and should remain in whatever arbitrary order it ends up in pre-3.6 (i.e. re-running program should keep same order even if not declaration order). Con: behaviour isn't easy to understand pre-3.6.

Additional context

https://tophat-opensource.slack.com/archives/CE14KJGET/p1543705988030800

noahnu avatar Dec 02 '18 05:12 noahnu

If visitor_a increments counter_a for some node of type X and visitor_b increments counter_b for node of type X only if counter_a is greater than 0, then changing the ordering that visitor_a and visitor_b are called will change the final value of the computed statistics

I can see this being very brittle. A change in visitor_a's behaviour or its removal entirely can completely break visitor_b

francoiscampbell avatar Dec 02 '18 05:12 francoiscampbell

@francoiscampbell yeah. It feels like an anti-pattern. Maybe @lime-green can provide an example of why we want ordering?

noahnu avatar Dec 02 '18 06:12 noahnu

Ordering by declaration order seems the most natural to me, which is what we have right now. I don't think we're looking for inter-dependency between visitors just yet, there hasn't really been a need for it from my usage of the library at least.

IMO should just keep the logic as-is until it's obvious that there's a problem

lime-green avatar Dec 02 '18 07:12 lime-green

Are we talking about the order of all visitors or per-node? Cause I was assuming that the order of all visitors would be the AST depth-first traversal order.

For a single node type, im not opposed to running its visitors in declaration order, although it has the potential to prevent any future changes or optimizations, since the visitor order effectively becomes part of the public API

francoiscampbell avatar Dec 02 '18 17:12 francoiscampbell

@francoiscampbell per-node; it's declaration order right now

lime-green avatar Dec 02 '18 20:12 lime-green

Yeah, that's cool. I'd say let's not make that a guarantee, just an implementation detail for now

francoiscampbell avatar Dec 02 '18 20:12 francoiscampbell