difftastic icon indicating copy to clipboard operation
difftastic copied to clipboard

Allow single-quoted strings to be the same as double-quoted strings

Open eduard93 opened this issue 9 months ago • 5 comments

Great tool!

(1) A description of the issue. A screenshot is often helpful too.

I'm comparing two python files, where the majority of changes is a swap between single and double quotes, which should be ignored. However difftastic shows quote changes as a difference.

image

(2) A copy of what you're diffing. I'm diffing files.

Before:

def my_func():
    print("Hello World")

After:

def my_func():
    print('Hello World')

Expected result: no differences.

(3) The version of difftastic you're using (see difft --version) and your operating system.

Difftastic 0.58.0 (0c92771 2024-05-10, built with rustc 1.65.0) Running on Windows 11.

eduard93 avatar May 15 '24 16:05 eduard93

This is intentional I'm afraid. Difftastic is a syntactic differ, and semantic information about which string literals are equivalent are out of scope. In some languages, single-quoted strings and double-quoted strings are different (e.g. in bash or PHP) due to different interpolation rules.

Wilfred avatar May 23 '24 11:05 Wilfred

Fair. Any chance you can point me to a place where I can make this change in my forked repo? Something like a place where difftastic gets a list of changes to output? I want to manually remove single/double quote differences.

eduard93 avatar May 23 '24 18:05 eduard93

@eduard93 you should be able to change what's considered the content of the Atom when it's a string here:

https://github.com/Wilfred/difftastic/blob/b88b4056203cdd3075cd341595411195671a163b/src/parse/syntax.rs#L411

If it's an AtomKind::String, drop the first and last characters of content.

Wilfred avatar May 28 '24 15:05 Wilfred

Thinking about this some more, I think this is a worthwhile addition, but language configurations should opt-in to it.

Wilfred avatar Jul 16 '24 22:07 Wilfred

@eduard93 you should be able to change what's considered the content of the Atom when it's a string here:

https://github.com/Wilfred/difftastic/blob/b88b4056203cdd3075cd341595411195671a163b/src/parse/syntax.rs#L411

If it's an AtomKind::String, drop the first and last characters of content.

First and last characters are not very clean I am aftaid. The Python itself has variable length of string delimiters. Multiline strings (syntactically different to usual " or ' strings, but not between themselves) are denoted with three of the same "ticks":

my_string = """Hello, World!
I love multiline strings.
"""
other_string = '''Hello, World!
I love multiline strings.
'''

assert my_string == other_string

Good news is that TS understands this; so maybe we can opt-in to take left and right node around the string? Here is generated tree, note "string_start" and "string_end" nodes.

image

ekorchmar avatar Jul 21 '24 13:07 ekorchmar