jj
jj copied to clipboard
jj creates divergent operations without any user input whatsoever
Description
jj creates divergent operations (and changes) when just typing into the editor - no commands required to run.
I have a theory on what the problem is (see actual behaviour section)
Steps to Reproduce the Problem
No commands involved, but I believe this is due to core.watchman.register-snapshot-trigger = true. I suspect the feature has race conditions that trigger when:
- You're on a large repo (thus the snapshot takes longer - ~1 second for me)
- You're using an editor that automatically saves files for you very frequently (I'm using vscode)
I get divergent changes in my jj log arbitrarily, after not having run any jj commands.
My op log contains:
@ cbda09db69fd [email protected] 1 minute ago, lasted 1 second
│ snapshot working copy
│ args: jj debug snapshot
○ 4bf1c16327ae [email protected] 1 minute ago, lasted 965 milliseconds
├─╮ reconcile divergent operations
│ │ args: jj debug snapshot
○ │ f6ff969db703 [email protected] 1 minute ago, lasted 932 milliseconds
├───╮ reconcile divergent operations
│ │ │ args: jj debug snapshot
○ │ │ 9594d9b3678a [email protected] 1 minute ago, lasted 1 second
│ │ │ snapshot working copy
│ │ │ args: jj debug snapshot
│ ○ │ 6e6f484ee8d9 [email protected] 1 minute ago, lasted 1 second
│ ├─╯ snapshot working copy
│ │ args: jj debug snapshot
│ ○ 2a563ca5d0de [email protected] 1 minute ago, lasted 1 second
├─╯ snapshot working copy
│ args: jj debug snapshot
○ 680b07ec0f8d [email protected] 4 minutes ago, lasted 1 second
│ snapshot working copy
│ args: jj debug snapshot
○ 88e1d6aa889c [email protected] 4 minutes ago, lasted 1 second
│ snapshot working copy
│ args: jj debug snapshot
...
Expected Behaviour
I expect to not have divergent changes
Actual Behaviour
jj creates two divergent changes. My theory is that:
- the user starts typing "ab". Vscode saves mid-sentence with only the content "a"
- Watchman sees a filesystem change and triggers a snapshot operation
- jj starts to create a new snapshot operation
2a56based off the current op head (680bin the example above) - jj reads the content "a" to the snappshot
2a56 - vscode writes "ab" to the file, triggering watchman's snapshot operation again
- jj starts to create a new snapshot operation
9594based off the current op head (680bin the example above) - jj writes the content "ab" to the snapshot
9594 - Both operations complete, creating divergent changes
- Divergent changes merge
If this theory is correct, then the solution I would propose is to, after completing a snapshot operation from watchman, check if the op head has changed, and if it has, you discard the current operation instead of attempting to reconcile it.
Specifications
- Platform: Linux
- Version: 0.29.0
I think @ilyagr also observed something similar when he used a older version of VisualJJ. We also should forward the env correctly to the watchman trigger as noted by @emilazy some time ago now.
https://github.com/jj-vcs/jj/issues/3747 might be related.
This was my experience with VisualJJ as well, and what caused me to remove it. I tried reproducing it without VisualJJ by interleaving actions manually, but was unable to do it and wrote it off as something about VisualJJ's inotify timing. But I didn't realize VSCode does frequent writes itself; that makes the above scenario seem very plausible.
Background snapshotting processes like VisualJJ or watchman.register-snapshot-trigger inevitably have this problem. To mitigate this class of problems, I think we'll need a wider lock to serialize local operations. This lock will cover the whole snapshot->update_repo->publish_op scope. Since this lock doesn't affect the correctness (= divergence is valid outcome if you ran two jj commands concurrently), it's probably good to ignore/override stale lock file left by crashed jj process.
I wonder if this is the same issue or a separate one. (I'm going back and forth on this)
I start with (Update: Expanded this example to show the gh-readonly-queue commits I already had)
$ jj log --at-op 3ebbe -r 'uxlp-..igtrunk' --no-pager
○ wyskno [email protected] 55 minutes ago gh-readonly-queue/main/pr-6836-5bfa79235d513cd885fb7b4d9f2a597e3e5b9a42@upstream a864eec
│ templater: make Operation object serializable
○ rsupsy [email protected] 55 minutes ago a3b0301
│ op_store: pack start/end_time into TimestampRange struct
○ qpwmpp [email protected] 55 minutes ago a8e5a5b
│ op_store: move TimestampRange type from templater
○ kmqumu [email protected] 55 minutes ago 39119ba
│ templater: inline TimestampRange::duration() -> String
○ tqylsw [email protected] 55 minutes ago gh-readonly-queue/main/pr-6835-3d3159ecd19fd5467d266235180729fd8364b1db@upstream 5bfa792
│ templater: make Commit object serializable
○ syokqt [email protected] 55 minutes ago 8adf0f6
│ templater: make CommitRef object serializable
○ zvtwss [email protected] 55 minutes ago f5424fa
│ merge: derive serde::Serialize for RefTarget
◆ uxlpvy [email protected] 1 hour ago main* 3d3159e
│ templater: make some more value types serializable
~
I then do jj git fetch and end up with the following:
Op log
$ jj op log -d
@ c2a9f7c31124 [email protected] 4 minutes ago, lasted 21 milliseconds
├─╮ reconcile divergent operations
│ │ args: jj log -r oldigtrunk..igtrunk
○ │ efe5291073f7 [email protected] 4 minutes ago, lasted 8 milliseconds
│ │ import git refs
│ │ args: jj op log --color never --quiet --no-graph --limit 1 --template id
│ │
│ │ Changed commits:
│ │ ○ - wyskno hidden a864eec templater: make Operation object serializable
│ │ ○ - rsupsy hidden a3b0301 op_store: pack start/end_time into TimestampRange struct
│ │ ○ - qpwmpp hidden a8e5a5b op_store: move TimestampRange type from templater
│ │ ○ - kmqumu hidden 39119ba templater: inline TimestampRange::duration() -> String
│ │ ○ - tqylsw hidden 5bfa792 templater: make Commit object serializable
│ │ ○ - syokqt hidden 8adf0f6 templater: make CommitRef object serializable
│ │ ○ - zvtwss hidden f5424fa merge: derive serde::Serialize for RefTarget
│ │
│ │ Changed remote bookmarks:
│ │ gh-readonly-queue/main/pr-6835-3d3159ecd19fd5467d266235180729fd8364b1db@upstream:
│ │ + untracked (absent)
│ │ - untracked tqylsw hidden 5bfa792 templater: make Commit object serializable
│ │ gh-readonly-queue/main/pr-6836-5bfa79235d513cd885fb7b4d9f2a597e3e5b9a42@upstream:
│ │ + untracked (absent)
│ │ - untracked wyskno hidden a864eec templater: make Operation object serializable
│ ○ 0c48c0f4eb8c [email protected] 4 minutes ago, lasted 2 seconds
├─╯ fetch from git remote(s) upstream
│ args: jj git fetch --remote upstream --branch main --branch master --branch 'glob:gh-readonly-queue*' --branch 'glob:ig/*'
│
│ Changed commits:
│ ○ + ysoorm 981d0b7 main*| revset: inline expect_<construct>_with() parsing helpers
│ ○ + yvnusy 1f91541 revset: extract expect_<construct>() parsing helpers
│ ○ + umxltz cb5b171 revset: remove unused diagnostics argument from expect_literal()
│ ○ + vsnmsu fda95e5 revset: reimplement alias stack helper without using recursion
│ ○ + ouyuyx d620529 revset: normalize and simplify type error messages
│ ○ + pwrrqm f2b024c revset: fix alias stack of pattern/literal parse error
│
│ Changed local bookmarks:
│ main:
│ + ysoorm 981d0b7 main*| revset: inline expect_<construct>_with() parsing helpers
│ - uxlpvy 3d3159e templater: make some more value types serializable
│
│ Changed remote bookmarks:
│ gh-readonly-queue/main/pr-6835-3d3159ecd19fd5467d266235180729fd8364b1db@upstream:
│ + untracked (absent)
│ - untracked tqylsw 5bfa792 templater: make Commit object serializable
│ gh-readonly-queue/main/pr-6836-5bfa79235d513cd885fb7b4d9f2a597e3e5b9a42@upstream:
│ + untracked (absent)
│ - untracked wyskno a864eec templater: make Operation object serializable
│ main@upstream:
│ + tracked ysoorm 981d0b7 main*| revset: inline expect_<construct>_with() parsing helpers
│ - tracked uxlpvy 3d3159e templater: make some more value types serializable
○ 3ebbe70c159a [email protected] 4 minutes ago, lasted 93 milliseconds
│ rebase commit 05b3400e015fbbeb3484e684fbfcfa05ba67a071
...
Log:
$ jj log -r 'uxlp-..main|main@upstream' --no-pager
○ ysoorm [email protected] 8 minutes ago main* 8a5e6c2
│ revset: inline expect_<construct>_with() parsing helpers
○ yvnusy [email protected] 8 minutes ago e72ebcc
│ revset: extract expect_<construct>() parsing helpers
○ umxltz [email protected] 8 minutes ago 08bc5f4
│ revset: remove unused diagnostics argument from expect_literal()
○ vsnmsu [email protected] 8 minutes ago 4de5791
│ revset: reimplement alias stack helper without using recursion
○ ouyuyx [email protected] 8 minutes ago 158f14a
│ revset: normalize and simplify type error messages
○ pwrrqm [email protected] 8 minutes ago be1d65a
│ revset: fix alias stack of pattern/literal parse error
│ ◆ ysoorm hidden [email protected] 28 minutes ago main@git main@upstream 981d0b7
│ │ revset: inline expect_<construct>_with() parsing helpers
│ ~ (elided revisions)
├─╯
◆ uxlpvy [email protected] 1 hour ago 3d3159e
│ templater: make some more value types serializable
~
Separate question
Update: This is solved in https://github.com/jj-vcs/jj/issues/6830#issuecomment-3011152939, the precedence of operations was not what I thought. 🤦
I find two things about the jj log behavior also weird:
-
In the second one, why were the hidden revisions in between
uxlpandmain@gitnot shown? -
In this one below (which was previously above, but no longer), I made the mistake of using
uxlpwhere I should have haduxlp-, and yet I still got output.
$ jj log --at-op 3ebbe -r 'uxlp..main|main@upstream' --no-pager
◆ uxlpvy [email protected] 1 hour ago main* 3d3159e
│ templater: make some more value types serializable
~
Update 2: In case it matters, my version of jj is roughly at commit 6323764 from 3 days ago.
Background snapshotting processes like VisualJJ or
watchman.register-snapshot-triggerinevitably have this problem. To mitigate this class of problems, I think we'll need a wider lock to serialize local operations. This lock will cover the whole snapshot->update_repo->publish_op scope. Since this lock doesn't affect the correctness (= divergence is valid outcome if you ran two jj commands concurrently), it's probably good to ignore/override stale lock file left by crashed jj process.
I proposed a simple solution above, It would probably look something like this:
enum ResolveStrategy {
DISCARD,
MERGE,
}
struct Operation {
...
strategy: ResolveStrategy
}
ResolveConcurrentChanges(op1, op2) {
if (op1.strategy == Discard) {
abandon(op1);
} else if (op2.strategy == Discard) {
abandon(op2);
} else {
merge(op1, op2);
}
}
Is there a reason we couldn't take an approach like this instead of widening the lock? I'd be concerned that widening the lock could slow down a foreground process of jj if a background one is running already.
$ jj op log -d @ c2a9f7c31124 [email protected] 4 minutes ago, lasted 21 milliseconds ├─╮ reconcile divergent operations │ │ args: jj log -r oldigtrunk..igtrunk ○ │ efe5291073f7 [email protected] 4 minutes ago, lasted 8 milliseconds │ │ import git refs
If you ran jj while background jj/git process is updating Git refs, the concurrent jj process might import intermediate state. I don't know if this was the case, though.
$ jj log -r 'uxlp-..main|main@upstream' --no-pager
Maybe you mean uxlp-..(main|main@upstream)?
I proposed a simple solution above, It would probably look something like this:
enum ResolveStrategy { DISCARD, MERGE, } struct Operation { ... strategy: ResolveStrategy } ResolveConcurrentChanges(op1, op2) { if (op1.strategy == Discard) { abandon(op1); } else if (op2.strategy == Discard) { abandon(op2); } else { merge(op1, op2); } }
If we add metadata to operation, not to operation heads, maybe we'll need to scan all operations down to the common ancestor operation, and discard intermediate operations as needed?
Is there a reason we couldn't take an approach like this instead of widening the lock? I'd be concerned that widening the lock could slow down a foreground process of jj if a background one is running already.
There are other related problems (such as racy git imports/exports) that can be addressed by wider lock.
$ jj log -r 'uxlp-..main|main@upstream' --no-pager
Maybe you mean
uxlp-..(main|main@upstream)?
Indeed, in both cases, thanks.
Do we have a way forward on this or a workaround? This occasionally makes jj nigh unusable on a big repo - especially if I'm trying to clean up a diff or history.
- Make a change
- Workspace stale - cannot do anything.
- Fix / delete lock/ abandon divergent etc
- jj status
- Goto 1
I've considered adjusting the code for reconciling concurrent operations to look for concurrent snapshot operations and keeping only the newest one.
I've considered adjusting the code for reconciling concurrent operations to look for concurrent snapshot operations and keeping only the newest one.
Disclaimer, I don't know the rust code base at all so this idea might be dumb, but seeing all the discussions focusing on the oplog and changes to it, I wonder:
Would it be possible for watchman to simply queue up changes and wait until a previously triggered snapshot operation has completed before requesting the next one? AFAIU, this would avoid concurrent oplog changes (unless the user runs additional jj commands) triggered by running watchman without needing to change jj core logic.
Would it be possible for watchman to simply queue up changes and wait until a previously triggered snapshot operation has completed before requesting the next one? AFAIU, this would avoid concurrent oplog changes (unless the user runs additional jj commands) triggered by running watchman without needing to change jj core logic.
This isn't a problem specific to watchman. I think @mlcui-corp ran into this issue as well without using watchman in a different context, I don't remember exactly what he did to trigger it though.
Same problem here. I've just started using Jujutsu on my local Firefox repo, and I frequently need to do bisects. Sadly, my auto-bisect session often gets interrupted by Internal error: Failed to check out commit [...] Caused by: Concurrent checkout. Would be delighted to see a workaround or solution.
IME "Concurrent checkout" is quite easy to trigger even without watchman. I keep one terminal window permanently running watch jj log (like @martinvonz does in one of his demos). Running something like while cargo c && jj next; end in another window produces empty commits everywhere, stale checkouts, etc. - and you can't even jj undo your way out of it because the oplog is no longer linear.
Anyway, I fixed my issues by aliasing jj to flock .jj/lock jj.
I think we should consider holding a lock across the whole command if we're going to snapshot the working copy (i.e. unless --ignore-working-copy or --at-op was passed). We don't currently do that because - unlike most VCSs - we don't need to do it for correctness. We can still it if it's going to result in better UX, and it seems to me like it would.
Using a lockfile as a "best-effort" measure for reducing divergent ops sounds like a good plan to me. My experience of doing this is mostly positive. I haven't had to clean up my tree or run workspace update-stale since making this alias. The only place I've wished for a more fine-grained lock is when running bisect. It keeps the lock for a long time and you can't run log to see what's going on. Not the end of the world though.
We might be able to get away with a smaller lock than whole operation. it looks like most of the divergence is coming from a race on Git HEAD - attempted to fix #7895
My concern with a lock-based approach to snapshotting is that of performance. Since watchman will trigger an snapshot every time a file is written, which for an IDE such as vscode, can be as frequent as every second or so, this is potentially a large concern to me as someone working on a very large repo.
I've considered adjusting the code for reconciling concurrent operations to look for concurrent snapshot operations and keeping only the newest one.
Was there a reason why this turned out to not be viable?
Workspace snapshotting is already protected by a lock, #7895 just includes interaction with Git HEAD under the same lock.
Regarding performance concerns - probably increasing settle parameter in Watchman config should help?
IME "Concurrent checkout" is quite easy to trigger even without watchman. I keep one terminal window permanently running
watch jj log(like @martinvonz does in one of his demos). Running something likewhile cargo c && jj next; endin another window produces empty commits everywhere, stale checkouts, etc. - and you can't evenjj undoyour way out of it because the oplog is no longer linear.Anyway, I fixed my issues by aliasing
jjtoflock .jj/lock jj.
The only place I've wished for a more fine-grained lock is when running bisect. It keeps the lock for a long time and you can't run log to see what's going on.
You should be able to just run watch jj log --ignore-working-copy without locking.
This will not create new snapshots or op log entries for you (e.g. if you modified a file via command line that will not be inspected), but it will keep showing you updates about the effects of other jj operations you have run so far, and works also during bisect.
That's a good tip, thanks!
@matts1, the fix for Git HEAD landed - would be great if you could install jj from latest main and see if the original problem with watchman still reproduces. Thanks!
I'm not familiar enough with your PR to know whether it was the cause of it, but my jj was hanging, and I managed to fix it by deleting both .jj/working_copy/working_copy.lock and .jj/repo/git_import_export.lock. Is there any chance that hitting control-C while it's snapshotting could leave the lock there so future snapshots can't succeed?
Is there any chance that hitting control-C while it's snapshotting could leave the lock there
On Windows, yes. On Unix, a lock file with no living processes is supposed to be ignored.
Thanks for testing @matts1! There was discussion of potential deadlock from the use of two locks but I'm not sure that is it yet.
Are you able to reproduce jj hanging or did it happen only once? Can you please share more details and the steps leading that hang, if you remember?
It makes sense that it hangs after a stray lock file is left on disk due to ^C, right? That's my understanding of what happened.
It makes sense that it hangs after a stray lock file is left on disk due to ^C, right? That's my understanding of what happened.
I haven't been able to reproduce that (on macOS). We are using flock on Unix, which would release the lock when all file descriptors are close on process termination.
@matts1 is this repo on a network filesystem by any chance? (Semantics of flock may be different there.)
It's only happened once, so I'm not sure.
- I'm running linux
- It's not a network filesystem
- It was on the jj repo itself, so not an especially large one.
- I'm not sure precisely what I was doing, but it did occur while I was working on some code related to snapshotting of the working copy. I don't believe I was touching any of the locking code though.
- I was mixing and matching using a stable jj version and this new jj version
- Was using the old version of jj most of the time
- Was testing out a jj I'd built
- Visualjj was running in the background on an old version of jj
I'm not confident it's your PR causing the problem. I can try and switch to the new version of jj on a more permanent basis to try it out and see if I can reproduce it.
I was able to reproduce this problem. I'm not entirely sure what the root cause is, but:
- This time I was using a combination of
jjbuilt from head and google's internal stable jj, which was last synced from upstream on November 15. - I was able to verify that removing
git_import_export.lockdid not solve the problem, but removingworking_copy.lockafter that did.
Let's close this because the divergence bug should be fixed. Feel free to open a new bug for the lock file issue.