ocamlformat icon indicating copy to clipboard operation
ocamlformat copied to clipboard

Optimize the `--numeric` feature for line/region re-indentation in editors

Open gpetiot opened this issue 4 years ago • 1 comments

~Wait for #1639 to be merged so the slicing optimization can be reviewed independently!~

This is a reboot of #1207, as the time overhead made the feature clunky, and inspired by #1484 to reduce the time overhead.

We don't need to parse invalid files outside of this feature, so the --format-invalid-files option has been removed. Instead, this PR defines a new --numeric=X-Y option that should have the same semantic as --numeric --lines=X-Y in ocp-indent.

The idea is:

  • if the program is too big (exceeds a number of lines) (as parsing the whole file would be too time consuming):

    • try to slice the input file into a smaller program keeping only the set of structure/signature items encompassing the X--Y range of lines. (note: the implementation of this feature is a bit hackish, but this should have nice results on most input if we consider the input is a file "almost" formatted already)
    • try to parse either the sliced source, the recovered source (if the first one failed, using menhir recovery parsing), or the original source (if both previously failed)
    • if the parsing is successful, we format the AST, then match the location tree of the formatted AST against the location tree of the unformatted AST to get the indentation produced by ocamlformat [1]
    • if the parsing failed, we use the ocp-indent API to produce an indentation
  • if the program is not too big:

    • try to parse either the source, or the recovered source (if the first one failed, using menhir recovery parsing)
    • if the parsing is successful, we format the AST, then match the location tree of the formatted AST against the location tree of the unformatted AST to get the indentation produced by ocamlformat [1]
    • if the parsing failed, we use the ocp-indent API to produce an indentation

Notes: [1] this does not return optimal results when multiple lines are made into one when formatted, for example "let x =\n a\nin\n ...." will be formatted as a single line "let x = a in\n ..." so this implementation cannot detect this and will return the same indentation for the 3 lines, which will not look good), ocp-indent is called on this range of lines to improve the indentation.

cc @charlesetc @olydis @emillon

gpetiot avatar Feb 23 '21 14:02 gpetiot

This one is finally ready for review @emillon @Julow, and ready to be tested @charlesetc @olydis

gpetiot avatar Apr 08 '21 20:04 gpetiot

This is too ad-hoc. Should use a legit parser instead.

gpetiot avatar Aug 24 '22 20:08 gpetiot