Rubberduck icon indicating copy to clipboard operation
Rubberduck copied to clipboard

Implementing extract method

Open bclothier opened this issue 6 years ago • 13 comments

This is going to be first iteration of restoring Extract Method functionality that was disabled from RD, this time NOT regex'ing stuff.

Because there are numerous cases to test, we want to get start with most simplest, dumbest implementation.

Basically, the first iteration should aim to:

  1. expand the selection to complete statements a) verify it's within a single procedure b) verify for complicating factors (e.g. GoTos, Exits, ....)
  2. collect all local variables used
  3. copy the statements
  4. show the dialog a) user choose whether variables are parameters or local b) user names the method
  5. add the Dim statements for local variables
  6. insert the method at end of module
  7. replace the original selection with a call to the method, passing in parameters

Note that with 1b, we will disallow selections where it includes flow of control statements that can go to some places not necessarily within the user's selection. That possibly could also include error handling, and of course any Exit, GoTo, GoSub.

Maybe the initial iteration can indicate why it can't extract method (e.g. Exit statement exists in the selection somewhere) so at least it's easy for user to manually fix while we make it smarter in subsequent iterations?

Thoughts? Suggestions?

bclothier avatar Sep 05 '17 23:09 bclothier

I'm not sure there's such a thing as as dumb implementation of this refactoring, but I agree it's a very complex one to tackle, with a ton of traps along the way. An iterative implementation seems like a very good idea.

The original implementation was rather naive, but did a few things pretty well.

Selection

The refactoring works off the user's selection in the active code pane... like all refactorings, really. The difference is that here we aren't working with a selected Declaration or IdentifierReference, we're working with a multi-line QualifiedSelection that encompasses multiple statements.

All refactorings need an ICommand implementation, which is constructor-injected into the corresponding CommandMenuItem - there's a naming convention here: FooCommand gets injected into FooCommandMenuItem, so ExtractMethodCommand would go into ExtractMethodCommandMenuItem. The command's CanExecute implementation determines under which conditions the command can be executed; when the method returns false, the corresponding menu item is disabled. Because every single command needs to evaluate its CanExecute result just before the code pane's context menu is displayed, the CanExecute method must be optimized for performance as much as possible, so as to avoid introducing a noticeable delay in the VBE displaying its context menu.

While it definitely sucks that a disabled command doesn't explain why it's disabled, I think consistency matters, and if all commands are contextually disabled, then Extract Method should be no different; IMO it's the wiki's job (or the website's? the RD News blog's? Stack Overflow?) to explain how to use a given command and why it's disabled under such or such condition.

So when should the refactoring be enabled?

  • Selection begins and ends inside the body of a single member procedure. The DeclarationFinder already contains a method that returns the selected declaration - we could write something similar to cache/return the member procedure that's selected, if applicable (return null if it's not applicable), so validating this point would amount to nothing more than a null-check.
  • Selection does not contain references to line numbers or line labels outside of it. That's trickier, but then again the DeclarationFinder should be helpful, given line numbers and labels each have a corresponding Declaration - we can inspect their References collection and compare their Selection to the user's selection using userSelection.Contains(labelRef.Selection) and userSelection.Contains(labelDeclaration.Selection) to validate that all line/label jumps are self-contained.
  • Selection does not jump out of the procedure scope. So Exit Sub/Function/Property in the selection would disable the refactoring. I don't see a way to get these to work, even in future iterations.
  • Selection does not include partial blocks. We can't extract the selection into its own method if the selection includes an If without an End If, or a For without a Next, or a With without an End With.

I think that covers it.


Variables (and constants) in scope

The refactoring needs to know about what's in-scope for the selection:

  • Locals declared and used inside the selection, but not outside of it, should be moved to the extracted procedure.
  • Locals declared inside the selection but only used outside of it, should be moved closer to usage, i.e. not be extracted, just moved somewhere underneath the selection, as close as possible to their first use.
  • Locals declared before the selection, only used inside the selection, should be moved to the extracted procedure.
  • Parameters used inside the selection should be passed as parameters to the extracted procedure. If passed ByRef and assigned within the selection, should be passed ByRef as well.

Mechanics

So the refactoring looks at what's in-scope, and before it can pop a preview box and prompt for a name, it needs to:

  • Determine the possible inputs; these would be things declared outside the selection, and referenced within.
    • Inputs that are assigned and then used in the original/calling procedure after the selection, must be passed by reference (ByRef).
    • Inputs should otherwise be passed ByVal by default.
  • Determine the possible outputs; these would be things assigned within the selection, and referenced after it.
    • If there's only one possible output value, we extract a Function procedure.
    • If there are multiple output values, we extract a Function procedure, pick/infer which to return (let the user select from a dropdown), and pass other return values ByRef.
    • If there is no possible output, we make a Sub.
  • Determine the body of the extracted method; the method's signature, the locals that move into the new procedure, the selected content. The resulting procedure should be indented accordingly with the indenter settings.

As the user specifies a name for the extracted method, and picks/confirms inputs and outputs, the preview box updates accordingly.

Okaying the preview actually creates the previewed member immediately underneath the calling procedure it was extracted from, and replaces all lines in the selection with a call to that extracted method. For a function, the selected return value determines which local variable receives the returned value. A re-parse is then requested on the parser state.

The extracted method is always Private.

retailcoder avatar Sep 06 '17 03:09 retailcoder

Oh, also - the refactoring needs to be able to work off a QualifiedSelection, so that we can invoke it programmatically, say, from a hypothetical quick-fix that suggests extracting a GoSub...Return subroutine into its own method... ref. issue #13 (that's right - one from the original wishlist!)

retailcoder avatar Sep 06 '17 03:09 retailcoder

Edge case for Exit Sub: if Exit Sub is a the end of the selection, we can probably exclude it from our selection. Not sure whether it's worth the hassle though.

If the Exit Sub is in a conditional, we would also exclude the conditional block, so long as the Exit is the only statement in the block.
Any other placement of Exit [..] can not be heuristically removed

Vogel612 avatar Sep 06 '17 10:09 Vogel612

Thanks for the additional points; good to think about them.

I suppose the preview could look something similar to this: Resharper dialog

Just to be clear, the first iteration will be dumb in the sense it won't try to refactor the variables from original procedure. It'll collect all identifiers and thus leave up to the user to decide whether it's a new local variable, a new input, or a new output.

Subsequent iterations will be a bit more smarter, automatically deleting variables from the original procedure that are now unused after extraction, auto-guessing which's output and which's parameters or handling edge cases of Exits. But I think the first version should be manual. Easier to automate clutches when they exist. :)

bclothier avatar Sep 06 '17 11:09 bclothier

The dialog seems to have disappeared, but this is what it looked like in v1.4.3:

687474703a2f2f692e737461636b2e696d6775722e636f6d2f46685577742e706e67

retailcoder avatar Sep 06 '17 13:09 retailcoder

As for variables in the original method, as long as the refactoring doesn't leave the code uncompilable, it's fine: a variable that's declared but never assigned or referred to will be the target of an inspection result after reparse anyway.

retailcoder avatar Sep 06 '17 13:09 retailcoder

Agreed, especially on the point that we must never write uncompilable code. That's just bad juju.

Two questions.

  1. Given that we must be quick'n'nimble in CanExecute, is it allowable to short-circuit some of the checks which may mean that the menu item is enabled but may fail to execute when we do additional analysis? Not saying that I will do that but just want to know if this is an option in case the analysis requires too much bit twiddling. If it's allowable, do we have a convention for that scenario?

  2. Just confirming that QualifiedSelection would allow me to expand the selection to the entire statements? Think about how block comment work - it's not necessary to select the actual start / end; you can start somewhere in middle of the first line, highlight n lines, and stop somewhere in middle of the last line. Commenting or uncommenting would then expand to the start/end. I think it's much easier doing that way (even though dem : could kink up the works)

bclothier avatar Sep 06 '17 14:09 bclothier

  1. Let's try to include all checks, and if there's a problem, then we can consider moving the less-common checks to the Execute implementation, and display a message box that explains why the selection isn't valid.
  2. A QualifiedSelection is nothing more than a Selection with a QualifiedModuleName, IOW it tells you selection coordinates and what module it's in (the Selection by itself isn't concerned with what module it belongs to).

To be honest, the part about identifying code blocks is the hard part. We might even need to implement some caching somewhere (seems doing that in the DeclarationFinder would be abusing its purpose), so that we have a very fast/efficient way of telling, given a QualifiedSelection, whether the selection is encompassing a whole block or not.

I wouldn't expand the user's selection: we should work with what the user is giving us.

If it's invalid, it's invalid.

e.g.

x   If foo < bar Then
x   'do stuff
x   '....
    End If

Should not be expanded to:

x   If foo < bar Then
x   'do stuff
x   '....
x   End If

However convenient that might be. At least not in the first iteration, when other more important cases aren't handled.

In order to figure out code blocks, we'll need a parse tree listener (to pick up starting & ending line positions of all code blocks) that runs during one of the resolver passes (could be together with the "identifier references pass" I guess), and something to abstract the blocks; perhaps a QualifiedSelection would be enough, but if additional metadata is needed then perhaps a small CodeBlock class or struct would do the trick.

retailcoder avatar Sep 06 '17 15:09 retailcoder

Actually, I wasn't talking about expanding selections vertically - only horizontally. Not going to try to add/remove extra lines; only to expand the start to the column 0 and the end to the column

I do think we have to implement a way to know when it's a statement for extract method to be meaningful. It simply won't any sense if we don't work at the statement level. That's also why I think selection has to be expanded at least to the statement. Otherwise, it'll be asking too much from the user to tell them that they must select everything from If to End Ifand notfootoEnd`

bclothier avatar Sep 06 '17 15:09 bclothier

Oh - in my mind the selection columns are simply ignored; we work with the StartLine and EndLine of the user's selection, simple as that :wink:

retailcoder avatar Sep 06 '17 15:09 retailcoder

@retailcoder calling for statement separator code weirdnesses with users... should be strongly documented what we assume

Vogel612 avatar Sep 06 '17 19:09 Vogel612

@Vogel612 :+1: good point - we can keep that in mind for a future iteration!

retailcoder avatar Sep 06 '17 19:09 retailcoder

@Vogel612 assign to me, add to epic

bclothier avatar Nov 08 '17 00:11 bclothier