darker icon indicating copy to clipboard operation
darker copied to clipboard

Darker changes lines when the previous lines got changed

Open wkentaro opened this issue 3 years ago • 5 comments

Describe the bug

  • Darker changes lines when the previous lines got changed.
  • It happens when there are trailing closing parenthesis of the function call.

To Reproduce

darker_tests.zip

unzip darker_tests.zip
cd darker_tests
git --no-pager diff
darker -l 79 spam.py --diff

Expected behavior

  • Only the line that was changed by editing gets updated. (in the below screenshot, only the first call of func1)

Screenshots image

Environment (please complete the following information):

  • OS: macOS
  • Python version: 3.9.10
  • Git version [e.g. 2.36.0]
% git --version
git version 2.32.1 (Apple Git-133)
hub version refs/heads/master
  • Darker version [e.g. 1.5.0]
% darker --version
1.5.0
  • Black version [e.g. 22.3.0]
% black --version
black, 22.1.0 (compiled: yes)

wkentaro avatar Aug 29 '22 16:08 wkentaro

This is due to difflib not splitting these 2 as different chunks.

In [3]: list(opcodes_to_chunks(opcodes, src, dst))
Out[3]:
[(1,
  ('def func1(a_long_long, b_long_long, c_long_long, d_long_long):',
   '    return a_long_long + b_long_long + c_long_long + d_long_long',
   '',
   '',
   'def main():'),
  ('def func1(a_long_long, b_long_long, c_long_long, d_long_long):',
   '    return a_long_long + b_long_long + c_long_long + d_long_long',
   '',
   '',
   'def main():')),
 (6,
  ('    e = func1(a_long_long=999, b_long_long=2000, c_long_long=3000, d_long_long=4000)',
   '    f = func1(a_long_long=1001, b_long_long=2001, c_long_long=3001, d_long_long=4001)'),
  ('    e = func1(',
   '        a_long_long=999, b_long_long=2000, c_long_long=3000, d_long_long=4000',
   '    )',
   '    f = func1(',
   '        a_long_long=1001, b_long_long=2001, c_long_long=3001, d_long_long=4001',
   '    )')),
 (8,
  ('    print(e, f)', '', '', 'if __name__ == "__main__":', '    main()'),
  ('    print(e, f)', '', '', 'if __name__ == "__main__":', '    main()'))]

wkentaro avatar Aug 29 '22 16:08 wkentaro

Thanks @wkentaro, this is a great summary and example case for a tricky problem. Since difflib doesn't understand the Python syntax, it's impossible for it to reliably tell where statement boundaries are.

In #221, I attempted a more involved approach by digging into the internals of Black and getting a stream of original/reformatted chunks directly from there, but I very quickly ran into a dead end. There was a fundamental reason why this can't be done in a foolproof way even with Black having access to the AST. Unfortunately I didn't write down the details of it, but it would probably be obvious if we went through it again. It had something to do with indented blocks and the extents of AST nodes.

If you're interested to ponder about the problem, we could continue discussion here or maybe in a call.

akaihola avatar Aug 29 '22 18:08 akaihola

Black has evolved quite a bit since I worked on #221, but essentially I tried to reimplement what today is in black._format_str_once() and turn it into a generator of (<original line(s)>, <reformatted line(s)>) pairs instead of collecting all reformatted lines into a string.

I was hoping to get a more granular list of chunks than what difflib can give us by comparing the results of reformatting to the original.

But as said, I failed (or at least was defeated by something that seemed insurmountable).

akaihola avatar Aug 29 '22 18:08 akaihola

Also, having to reimplement parts of Black internals would of course make maintenance of Darker vastly heavier for obvious reasons – at least if we'd like to allow users to benefit from features of new Black versions going forward.

akaihola avatar Aug 29 '22 18:08 akaihola

Oh, I think I now remember a very very difficult case: non-standard indentation.

Let's say you have this file:

if True:
 for x in range(5):
  try:
   print(10 / x)
  except ZeroDivisionError:
   print("Can't divide")

Currently difflib bundles all the reindented lines into one chunk, and if you e.g. change only the fourth line to print(20/x), Darker will simply still reformat the whole file.

The dream with more granular handling would of course be that the "wrong" indentation be retained and other reformatting applied. A naïve implementation would simply cause the indentation to not match the surrounding code and be either invalid or not match the original AST. The outermost indentation level could perhaps be reverted in a post-processing step to match surrounding code, but somehow I feel there are going to be nasty edge cases where this would break apart.

akaihola avatar Aug 29 '22 19:08 akaihola