jj Conflicts at the end of files that don't end in a newline are not materialized correctly

trafficstars

jj new
echo -n "aa" > qq
jj new
echo -n "bb" > qq
jj new @-
echo -n "cc" > qq
jj new all:@-+
cat qq

Actual Result

<<<<<<< Conflict 1 of 1
%%%%%%% Changes from base to side #1
-aa+cc+++++++ Contents of side #2
bb>>>>>>> Conflict 1 of 1 ends

If the file is subsequently edited, jj cannot parse the conflict correctly.

Expected Behavior

I'd materialize the conflict as though the files ended in newlines, if we decide this is acceptable. We could perhaps notify the user about this in the labels:

<<<<<<< Conflict 1 of 1 (extra newlines inserted)
%%%%%%% Changes from base to side #1
-aa
+cc
+++++++ Contents of side #2
bb
>>>>>>> Conflict 1 of 1 ends

If this is not acceptable, I'm not sure what is best to do.

Jun 26 '24 04:06 ilyagr

I think pretending that the file had newlines is my preference. This will only matter if the user edits the conflicts manually, in which case the file was probably meant to end in a newline anyway.

There is one alternative on my mind, but I don't like it:

The fish shell prints ⏎\n when a command's output does not end in a newline:

$ echo -n aa
aa⏎
$ # Next command here

We could in principle use a trick like that (either with ⏎\n or with \nJJ: NO NEWLINE\n), but I'm not sure it's worth it. Also, we then lose the ability to represent a file that ends with ⏎\n. It's also a curious question what to do if a conflict not at the end of file includes ⏎\n.

Jun 26 '24 04:06 ilyagr

Mercurial adds "\ No newline at end of file" in diffs. We could copy that. I'm fine with either that or adding just a newline as you suggested.

Jun 26 '24 04:06 martinvonz

I like the idea of using the existing label syntax; that avoids introducing any more ambiguity with change contents that isn’t already present. The two sides could disagree on the presence of the newline, right? If putting it at the end of the label line attached to the relevant text is too subtle, maybe another label line could be added?

Jun 26 '24 06:06 emilazy

IIUC the main idea in the thread so far is to only render the lack of a newline, but not necessarily be able to operate on it in a principled fashion?

Shouldn't the lack of a newline be encoded in the conflict marker syntax itself? It seems syntactically important. Wouldn't you want to preserve the invariant that you can reconstruct any of the file sides directly from the conflicted contents, rather than having to consult an out-of-band store?

Jun 26 '24 16:06 arxanas

UPDATE: I decided to promote this to a separate issue, https://github.com/martinvonz/jj/issues/3975.

Wouldn't you want to preserve the invariant that you can reconstruct any of the file sides directly from the conflicted contents, rather than having to consult an out-of-band store?

This issue has been nerd-sniping me for a while; in addition to trailing newlines, it's also a matter of encoding conflict markers inside files

TODO: find link. I remember discussing it with Martin recently in some PR, haven't found it yet. TLDR: jj currently can't handle conflicts in out own tutorial.md because of

https://github.com/martinvonz/jj/blob/638649b435795bbb991d840e9ad1750648d9b74a/docs/tutorial.md?plain=1#L284-L291

In the shortest term, I'll probably ignore both issues and just add newlines to the end of files.

However, I think I finally figured out a syntax that will work, will not require a complicated parser, and even be somewhat readable. Before computing conflicts (and only if a file is conflicted), we'd encode file contents as follows:

A mildly pathological file

JJ Verbatim Line:<<<<<<<<<<<
blahblah
JJ EOF: No newline at the end of file

A moderately pathological file

JJ Verbatim Line:<<<<<<<<<<<
JJ Verbatim Line:JJ Verbatim Line:<<<<<<<<<<<
blahblah
JJ Verbatim Line:JJ EOF: No newline at the end of file

Another fun file

blah
JJ Verbatim Line:JJ EOF: No newline at the end of file
JJ EOF: No newline at the end of file

Fun, huh?

This relies on conflict markers having to start at the beginning of the line. If we ever have a notion of conflicts that happen in the middle of lines, we'd need a different syntax, of course. I think that would be nice, but also would probably need a whole different UI to be usable; I'm not sure it can be very usable in the terminal or as a text file.

Jun 26 '24 19:06 ilyagr

Somebody asked a few days ago whether this will handle the similar problem with diffs, but then seemed to delete their comment. Shamelessly stealing their example from my email, it was (after minor edits):

$ jj diff
Modified regular file .gitignore:
   1    1: build
        2: ignoreAdded regular file .vscode/settings.json:
        1: {
        2:     "spellright.language": [
        3:         "en"
        4:     ],
        5: }

I don't actually have an answer (it may or may not be part of the same solution, and it should look similar), but we certainly shouldn't forget about that.

Oct 06 '24 01:10 ilyagr

Hi, that was me, and I deleted it because I noticed I was a version behind, so I upgraded and 0.21.0 handles it just fine, so it seems like there's no extra work to be done here!

Oct 07 '24 12:10 bwinton

I think @yuja may have fixed it without me noticing. Thank you, Yuya!

Oct 08 '24 05:10 ilyagr

It seems that our colored_words diff is currently a bit lossy in this case. This wouldn't often come up, but it might be confusing occasionally.

We could add a message (e.g. "Added a newline at the end of file" in green or "Removed a newline at the end of file" in red), perhaps, to clarify when there is a diff in whether there is a newline. I don't know whether I'd get to it anytime soon, though.

I also think the jj diff --git output is not the clearest, but it's identical to git diff, so we should probably keep it.

Oct 08 '24 06:10 ilyagr

Another alternative for the colored words diff would be to do what the fish shell does, and represent a lack of a newline at EOL by the ⏎ symbol:

I'm not sure how clear this is, but at least there is precedent.

(Also, this entire problem is less important than the original problem of malformed diff)

Oct 08 '24 06:10 ilyagr

It seems that our colored_words diff is currently a bit lossy in this case.

Yes, maybe we can add some marker or header/footer message. Its output doesn't have to be parsable, so we can freely choose the syntax.

Oct 08 '24 06:10 yuja

I have another possible solution that I just thought of, but I'm not sure if it's too complicated. Basically, we introduce a new conflict marker \\\\\\\ that indicates that the immediate previous conflict's trailing newlines should be deleted. It would always come immediately after the end of a conflict, so that it's clear to the user that it's a special marker that they will need to delete along with the conflict markers. If a non-conflicted line is missing a newline at the end of the file, we would just leave it as-is, since it's not part of a conflict.

Here's an example of what a conflict might look like, where the base and side 1 don't have a newline at the end of the file, but side 2 adds a newline:

<<<<<<< Conflict 1 of 1
%%%%%%% Changes from base to side #1
-base
+left
+++++++ Contents of side #2
right

>>>>>>> Conflict 1 of 1 ends
\\\\\\\ No newline after conflict

Or if the base has a newline at the end of the file, but both sides remove it:

<<<<<<< Conflict 1 of 1
+++++++ Contents of side #1
left
%%%%%%% Changes from base to side #2
-base
-
+right
>>>>>>> Conflict 1 of 1 ends
\\\\\\\ No newline after conflict

So basically if any non-empty side/base of the conflict is missing a trailing newline, then we add a newline to every side/base in the conflict before we materialize it, and we add a \\\\\\\ marker after we materialize it to indicate that these added newlines should be ignored when parsing.

I think this has some nice benefits:

It's consistent with the other markers we add to the file, so the user will recognize it as something they need to remove while resolving the conflict.
Just like the other conflict markers, it can be expanded to have a longer length if there's already a line in the file which starts with \\\\\\\, so we don't have to worry about any special escaping logic.
I think it's a bit easier to read, since if I see a diff where a blank line is added/removed, it has a pretty clear meaning. Whereas if I see a diff where \JJ EOF: No newline at end of file is added/removed, it's a bit confusing since removing \JJ EOF: No newline at end of file would mean adding a newline.

Nov 25 '24 01:11 scott2000

I don't have a strong opinion, but I think abusing comment part might be also good, something like +++++++ Contents of side #1 (noeol) or %%%%%%% Changes from base to side #2 (+noeol).

Maybe we can also restore the EOL state from the original conflict contents (without encoding it in the conflict marker.) It's unlikely that user wants to modify EOL in conflicts without resolving it.

Nov 25 '24 03:11 yuja

@scott2000: I think it's a bit easier to read, since if I see a diff where a blank line is added/removed, it has a pretty clear meaning. Whereas if I see a diff where \JJ EOF: No newline at end of file is added/removed, it's a bit confusing since removing \JJ EOF: No newline at end of file would mean adding a newline.

How is that different in the \\\\\\\ No newline after conflict scheme? Don't you still add/remove a line that corresponds to the inverse of the content that's present?

@yuja: I don't have a strong opinion, but I think abusing comment part might be also good, something like +++++++ Contents of side #1 (noeol) or %%%%%%% Changes from base to side #2 (+noeol).

I think it might be nice to modify the conflict marker format in the same line, because we can add any number of metadata parts without needing a corresponding number of extra metadata lines. Some other attributes we could include in this way:

Line ending format (LF vs CRLF)
Diff format (line, word, whitespace handling, amount of context, etc.)
- Perhaps we could somehow produce/consume native Git diff formats as well, if we marked things explicitly in this way?
Original commit/tree ID

Of course, if we choose to store structured metadata in this way, we should establish a real specification for the format.

Nov 25 '24 19:11 arxanas

I think it's a bit easier to read, since if I see a diff where a blank line is added/removed, it has a pretty clear meaning. Whereas if I see a diff where \JJ EOF: No newline at end of file is added/removed, it's a bit confusing since removing \JJ EOF: No newline at end of file would mean adding a newline.

How is that different in the \\\\\\\ No newline after conflict scheme? Don't you still add/remove a line that corresponds to the inverse of the content that's present?

I meant for within the conflicts markers themselves under the %%%%%%% diff section. Like if a side added a trailing newline, I think that this:

%%%%%%% Changes from base to side #1
-base
+left
+

Is clearer than this:

%%%%%%% Changes from base to side #1
-base
-\JJ EOF: No newline at end of file
+left

I think @yuja was suggesting also it could be displayed like this:

%%%%%%% Changes from base to side #1 (-noeol)
-base
+left

Which I also think is a bit clearer, but it suffers from the same double-negative (to me, -noeol means "side 1 removed not-EOL", which actually means "side 1 added EOL"). That may be unavoidable though, since the default is to have an EOL, so it does make sense to only mark the cases where the default isn't happening. Maybe it would be better for me to think of it as "the removed term has the property 'noeol'", which could be more useful in general if we support other properties in the future.

Nov 25 '24 20:11 scott2000

I've been thinking about this a bit more, and now I'm thinking @yuja's idea of just restoring the "noeol" state from the existing conflict data is the best approach, rather than encoding it in the conflict markers and parsing it in some way. Here's my thoughts:

I think parsing part of the "note" section of the conflict markers could be surprising to users (since usually the "note" part of the conflict markers doesn't matter), and it would reduce compatibility with existing tools that generate Git-style conflict markers (since these tools wouldn't emit the "noeol" note in the way the we expect, so they would end up discarding the "noeol" state by accident). I don't think accidentally losing this information would be a big issue, but it definitely seems nicer if we can avoid it.

If we were to parse the "noeol" state from the conflict markers, then we would also have to decide how to handle the case where the "noeol" markers appear in the middle of a file rather than at the end, which seems more complicated than it needs to be.

Also, we already rely on the current conflict state for other things (determining the number of sides to expect and distinguishing deleted sides from empty sides), so it seems reasonable to also resolve this the same way (it seems especially similar to the deleted file case to me). If we want to come up with a way to encode all of that information in the conflict markers directly, then that seems like a separate issue to me.

If we restore the "noeol" state from the tree, we can still add a note to the conflict markers indicating that the conflict involved a "noeol" side, just to make it easier to understand for the user. I've implemented this approach in a draft PR here: #5186.

Jan 17 '25 03:01 scott2000

@scott2000 I saw https://github.com/jj-vcs/jj/pull/5186 was put up for review. I don't want to block that PR, but I'd like to know about certain holistic usability/design considerations. Let me know if this was already discussed somewhere.

Some questions:

Did the term noeol come directly from Git? Do a lot of users already know what it means?
Do we care about making the term understandable/intuitive when it appears in these kinds of conflicts?
- Specifically: We could reasonably say that we don't care if we expect that noeol situations are uncommon in practice.
Isn't it a bit difficult to reason about what -noeol means due to the double-negative?
Is there a contextual clue that I as a user can use to determine that [stuff] is semantically relevant, unlike the text Changes from base to side #1 (I assume)?
- Or have I misunderstood the design, and it's not semantically-relevant?
Is the design generalizable to other kinds of unrenderable conflicts?
- For example: Can we express file mode changes in this way? Should we add an executable bit flag as well?
- For example: If we're hypothetically resolving conflicts Git-style submodules, we could render something like hash: xxx in a similar tag, and then select one of them to keep (or produce an altogether new hash).
- For example: Could we insert explicit copy/move/rename markers in this way?
- Is it intended to eventually have multiple different bracketed attributes? Multiple of the same kind?
- Is there a design for defining key-value metadata, rather than boolean metadata?
- Is the design only relevant for hunk-level metadata, rather than file-level metadata?

Taking a look at the example conflict with [-noeol], I'm putting myself in the shoes of a new user and I can't imagine I would understand what noeol means, or how to resolve it. For reference:

    <<<<<<< Conflict 1 of 1
    %%%%%%% Changes from base to side #1 [-noeol]
    -base
    +++++++ Contents of side #2 [noeol]
    right
    >>>>>>> Conflict 1 of 1 ends
    "###

We might be able to improve the UX in a different design. To take one example (which I'm not necessarily proposing strongly), in the below rendering, I could make a guess that setting false to true (or deleting that line altogether) in Contents of side #2 might let me adjust the trailing newline as part of the conflict resolution:

    <<<<<<< Conflict 1 of 1
    %%%%%%% Changes from base to side #1
    -base
    \\\\\\\ trailing_newline: true
    +++++++ Contents of side #2
    right
    \\\\\\\ trailing_newline: false
    >>>>>>> Conflict 1 of 1 ends
    "###

It's still not necessarily obvious that modifying the conflict markers is semantically-meaningful, but I think I would be more likely to do the right thing. (Or we could even add \\\\\\\ comment: set below to false or true in order to toggle newline to make it explicit 😉.)

Jan 26 '25 23:01 arxanas

@arxanas Thanks for the interesting questions!

Did the term noeol come directly from Git? Do a lot of users already know what it means?

I think I've seen it used in Vim, but I don't think it's used in Git. We could definitely find a better term for it, or use more explicit [no terminating newline]. I don't have any strong opinions. Whatever we decide on, we could put it in the documentation in case anyone finds it confusing.

Do we care about making the term understandable/intuitive when it appears in these kinds of conflicts?

Specifically: We could reasonably say that we don't care if we expect that noeol situations are uncommon in practice.

I think it would be good if it's intuitive, but if it seems like a pretty rare situation considering that jj currently generates completely invalid conflict markers and only a couple people have ever run into it from what I know.

Isn't it a bit difficult to reason about what -noeol means due to the double-negative?

Yes, that is definitely a concern. I don't have any strong preference here. Perhaps [terminating newline added] and [terminating newline removed] would be clearer?

Is there a contextual clue that I as a user can use to determine that [stuff] is semantically relevant, unlike the text Changes from base to side #1 (I assume)?

Or have I misunderstood the design, and it's not semantically-relevant?

I wouldn't consider the [noeol] comments to be semantically relevant, since they aren't being parsed by jj (instead we're just keeping the noeol state from the current tree when parsing). I don't think it would be relevant to most users either, since most programming languages don't care whether a file ends in a newline. Git actually doesn't even add any comment at all in its materialized conflict markers when this situation happens (as far as I can tell), so we're mostly adding it as a convenience to the user (so that they aren't confused about what changed if they see two identical sides).

If we were going to parse the comments, I think it would be reasonable to say "anything enclosed in square brackets at the end of the conflict marker line is semantically meaningful", but I could also see benefits to using a separate marker like \\\\\\\ as you suggested (it would certainly be harder to accidentally delete an entire line, but it also would reduce interoperability with tools which expect Git-style markers only).

Is the design generalizable to other kinds of unrenderable conflicts?

No, it is not (at least not as currently implemented). I was thinking that it's better to implement this simpler solution in the short-term since it matches the way we handle other unrenderable conflict state currently (like executable bit, deleted files, etc.), and then we could leave a more general design as a future problem which would potentially replace this solution, along with the other currently-unrenderable conflicts.

I'm not entirely convinced that adding more semantic meaning to the conflict markers themselves is the right approach, since it could be unintuitive to a user that editing the contents of a file could change metadata about the file (although that doesn't really apply to the [noeol] case since the newline state is part of the contents of the file).

My general thinking is that it's better to merge the solution that requires the smallest changes first (since it seems like we don't have consensus on the best long-term solution still, and the current behavior of generating invalid conflict markers is definitely wrong), and then when/if we decide on a better solution, it could be replaced.

I'm open to other ideas though, and I'd also be happy to split the commit in the PR which adds the [noeol] comments into a separate PR and make the current PR just focus on fixing the immediate issue of invalid conflict markers without addressing how to present the information to users.

Jan 26 '25 23:01 scott2000

I created a separate issue for this discussion since it has evolved beyond the original scope of this issue: #5496

Jan 27 '25 23:01 scott2000

jj jj copied to clipboard

Conflicts at the end of files that don't end in a newline are not materialized correctly

Actual Result

Expected Behavior

jj
jj copied to clipboard