nbdime
nbdime copied to clipboard
git-nbdiffdriver fails to parse branch1:file1..branch2:file2 syntax
I came across this issue while maintaining the https://github.com/yandexdataschool/Practical_RL repository, so I will provide examples based on it.
With the following setting in ~/.gitconfig
[diff "jupyternotebook"]
command = git-nbdiffdriver diff
the following command fails:
$ git diff master:week07_seq2seq/practice_tf.ipynb..coursera:week6_outro/seq2seq/practice_tf.ipynb
usage: git-nbdiffdriver [-h] [--version] [--config] [--log-level {DEBUG,INFO,WARN,ERROR,CRITICAL}] {diff,webdiff,config} ...
git-nbdiffdriver: error: unrecognized arguments: index da535d8..a2c89e8 100644
fatal: external diff died, stopping at week07_seq2seq/practice_tf.ipynb
Apparently, either git-nbdiffdriver does not parse its arguments properly, or git does not pass them correctly.
nbdiff-web also has a similar issue:
$ nbdiff-web master:week07_seq2seq/practice_tf.ipynb..coursera:week6_outro/seq2seq/practice_tf.ipynb
Traceback (most recent call last):
File "/home/ser/.local/bin/nbdiff-web", line 10, in <module>
sys.exit(main())
File "/home/ser/.local/lib/python3.7/site-packages/nbdime/webapp/nbdiffweb.py", line 96, in main
return main_diff(opts)
File "/home/ser/.local/lib/python3.7/site-packages/nbdime/webapp/nbdiffweb.py", line 79, in main_diff
base, remote, path = resolve_diff_args(opts)
File "/home/ser/.local/lib/python3.7/site-packages/nbdime/args.py", line 406, in resolve_diff_args
if not is_gitref(base):
File "/home/ser/.local/lib/python3.7/site-packages/nbdime/gitfiles.py", line 77, in is_gitref
is_valid_gitref(candidate)
File "/home/ser/.local/lib/python3.7/site-packages/nbdime/gitfiles.py", line 86, in is_valid_gitref
repo.commit(ref)
File "/usr/lib/python3.7/site-packages/git/repo/base.py", line 466, in commit
return self.rev_parse(text_type(rev) + "^0")
File "/usr/lib/python3.7/site-packages/git/repo/fun.py", line 322, in rev_parse
obj = obj[rev[start:]]
File "/usr/lib/python3.7/site-packages/git/objects/tree.py", line 298, in __getitem__
return self.join(item)
File "/usr/lib/python3.7/site-packages/git/objects/tree.py", line 225, in join
item = tree[token]
File "/usr/lib/python3.7/site-packages/git/objects/tree.py", line 298, in __getitem__
return self.join(item)
File "/usr/lib/python3.7/site-packages/git/objects/tree.py", line 244, in join
raise KeyError(msg % file)
KeyError: "Blob or Tree named 'practice_tf.ipynb..coursera:week6_outro' not found"
Needless to say, without the ~/.gitconfig setting git works as expected.
The git docs says that the diff driver is supposed to be called with the following arguments:
path old-file old-hex old-mode new-file new-hex new-mode
I wonder if this is a newer feature, or an older one. Which version of git do you use?
$ git --version
git version 2.22.0
This syntax is documented in git-diff docs:
For a more complete list of ways to spell
<commit>, see "SPECIFYING REVISIONS" section in gitrevisions[7].
Please note that the .. syntax indicates range. git-diff docs say this should be synonymous to passing the two endpoints ("This is synonymous to the previous form."). With git-nbdiffdriver, both result in the same error:
$ git diff master:week07_seq2seq/practice_tf.ipynb..coursera:week6_outro/seq2seq/practice_tf.ipynb
usage: git-nbdiffdriver [-h] [--version] [--config] [--log-level {DEBUG,INFO,WARN,ERROR,CRITICAL}] {diff,webdiff,config} ...
git-nbdiffdriver: error: unrecognized arguments: index 6681040..b9a8a11 100644
fatal: external diff died, stopping at week07_seq2seq/practice_tf.ipynb
$ git diff master:week07_seq2seq/practice_tf.ipynb coursera:week6_outro/seq2seq/practice_tf.ipynb
usage: git-nbdiffdriver [-h] [--version] [--config] [--log-level {DEBUG,INFO,WARN,ERROR,CRITICAL}] {diff,webdiff,config} ...
git-nbdiffdriver: error: unrecognized arguments: index 6681040..b9a8a11 100644
fatal: external diff died, stopping at week07_seq2seq/practice_tf.ipynb
However, with nbdiff-web the two produce different exceptions:
$ nbdiff-web master:week07_seq2seq/practice_tf.ipynb..coursera:week6_outro/seq2seq/practice_tf.ipynb
Traceback (most recent call last):
File "/home/ser/.local/bin/nbdiff-web", line 10, in <module>
sys.exit(main())
File "/home/ser/.local/lib/python3.7/site-packages/nbdime/webapp/nbdiffweb.py", line 96, in main
return main_diff(opts)
File "/home/ser/.local/lib/python3.7/site-packages/nbdime/webapp/nbdiffweb.py", line 79, in main_diff
base, remote, path = resolve_diff_args(opts)
File "/home/ser/.local/lib/python3.7/site-packages/nbdime/args.py", line 406, in resolve_diff_args
if not is_gitref(base):
File "/home/ser/.local/lib/python3.7/site-packages/nbdime/gitfiles.py", line 77, in is_gitref
is_valid_gitref(candidate)
File "/home/ser/.local/lib/python3.7/site-packages/nbdime/gitfiles.py", line 86, in is_valid_gitref
repo.commit(ref)
File "/usr/lib/python3.7/site-packages/git/repo/base.py", line 466, in commit
return self.rev_parse(text_type(rev) + "^0")
File "/usr/lib/python3.7/site-packages/git/repo/fun.py", line 322, in rev_parse
obj = obj[rev[start:]]
File "/usr/lib/python3.7/site-packages/git/objects/tree.py", line 298, in __getitem__
return self.join(item)
File "/usr/lib/python3.7/site-packages/git/objects/tree.py", line 225, in join
item = tree[token]
File "/usr/lib/python3.7/site-packages/git/objects/tree.py", line 298, in __getitem__
return self.join(item)
File "/usr/lib/python3.7/site-packages/git/objects/tree.py", line 244, in join
raise KeyError(msg % file)
KeyError: "Blob or Tree named 'practice_tf.ipynb..coursera:week6_outro' not found"
$ nbdiff-web master:week07_seq2seq/practice_tf.ipynb coursera:week6_outro/seq2seq/practice_tf.ipynb
Traceback (most recent call last):
File "/home/ser/.local/bin/nbdiff-web", line 10, in <module>
sys.exit(main())
File "/home/ser/.local/lib/python3.7/site-packages/nbdime/webapp/nbdiffweb.py", line 96, in main
return main_diff(opts)
File "/home/ser/.local/lib/python3.7/site-packages/nbdime/webapp/nbdiffweb.py", line 79, in main_diff
base, remote, path = resolve_diff_args(opts)
File "/home/ser/.local/lib/python3.7/site-packages/nbdime/args.py", line 415, in resolve_diff_args
if is_gitref(base) and not is_gitref(remote):
File "/home/ser/.local/lib/python3.7/site-packages/nbdime/gitfiles.py", line 77, in is_gitref
is_valid_gitref(candidate)
File "/home/ser/.local/lib/python3.7/site-packages/nbdime/gitfiles.py", line 86, in is_valid_gitref
repo.commit(ref)
File "/usr/lib/python3.7/site-packages/git/repo/base.py", line 466, in commit
return self.rev_parse(text_type(rev) + "^0")
File "/usr/lib/python3.7/site-packages/git/repo/fun.py", line 322, in rev_parse
obj = obj[rev[start:]]
File "/usr/lib/python3.7/site-packages/git/objects/tree.py", line 298, in __getitem__
return self.join(item)
File "/usr/lib/python3.7/site-packages/git/objects/tree.py", line 225, in join
item = tree[token]
File "/usr/lib/python3.7/site-packages/git/objects/tree.py", line 298, in __getitem__
return self.join(item)
File "/usr/lib/python3.7/site-packages/git/objects/tree.py", line 244, in join
raise KeyError(msg % file)
KeyError: "Blob or Tree named 'practice_tf.ipynb^0' not found"
The first of these errors might suggest that tokenization is broken.
This syntax is documented in git-diff docs:
For a more complete list of ways to spell
, see "SPECIFYING REVISIONS" section in gitrevisions[7].
Thanks for the links. However, that is the docs for the CLI of git itself. In their section on external diff drivers (this page under the heading "Defining an external diff driver") it says:
To define an external diff driver jcdiff, add a section to your $GIT_DIR/config file (or $HOME/.gitconfig file) like this:
[diff "jcdiff"] command = j-c-diffWhen Git needs to show you a diff for the path with diff attribute set to jcdiff, it calls the command you specified with the above configuration, i.e. j-c-diff, with 7 parameters, just like GIT_EXTERNAL_DIFF program is called. See git[1] for details.
As such, git breaks their own docs by passing .. notation to external diff tools. Ideally this should be reported to git. If they have updated their API, I would strongly prefer them to publish a new spec for that before attempting to implement support.
However, with nbdiff-web the two produce different exceptions:
Neither the .. nor <ref>:<path> syntax is currently supported by the nbdiff[-web] CLI. We use GitPython under the hood, so if that package has support for any of these, then PRs that adds support for this in nbdime will be very welcome :)
Before reporting the problem to [email protected], I would like to ensure that this is indeed a bug. I have created a small repo to reproduce the problem: https://github.com/dniku/git-external-diff-argv Could you verify that what this repo describes is actually a bug?
Whether it is a bug or simply outdated documentation might not be very clear. I would of course also recommend that you search the mailing list archives first for similar reports (it might even be fixed).
Other than that the report seems sane from my view, but I would maybe link directly to the relevant header: https://www.git-scm.com/docs/git/2.22.0#Documentation/git.txt-codeGITEXTERNALDIFFcode
Submitted here.
Would you like to to file a separate issue for lack of support for <ref>:<path> in nbdiff[-web] to keep track of that problem?
Would you like to to file a separate issue for lack of support for :
in nbdiff[-web] to keep track of that problem?
We can do that issue here, since the other issue has been pushed upstream.
My (somewhat arbitrary) guess is that perhaps it was intended to use .rev_parse() instead of .commit() in https://github.com/jupyter/nbdime/blob/f96ac65/nbdime/gitfiles.py#L86.
In [1]: from git import Repo
In [2]: repo = Repo('.')
In [3]: repo.rev_parse('origin/branch1:file1.txt')
Out[3]: <git.Blob "802b1c4ed7b06162b2ce09b7db72a576695b96e5">
In [4]: repo.commit('origin/branch1:file1.txt')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-4-eb9fabe894f3> in <module>
----> 1 repo.commit('origin/branch1:file1.txt')
/usr/lib/python3.7/site-packages/git/repo/base.py in commit(self, rev)
464 return self.head.commit
465 else:
--> 466 return self.rev_parse(text_type(rev) + "^0")
467
468 def iter_trees(self, *args, **kwargs):
/usr/lib/python3.7/site-packages/git/repo/fun.py in rev_parse(repo, rev)
320 obj = obj.tree
321 # END get tree type
--> 322 obj = obj[rev[start:]]
323 parsed_to = lr
324 else:
/usr/lib/python3.7/site-packages/git/objects/tree.py in __getitem__(self, item)
296 if isinstance(item, string_types):
297 # compatibility
--> 298 return self.join(item)
299 # END index is basestring
300
/usr/lib/python3.7/site-packages/git/objects/tree.py in join(self, file)
242 join_path(self.path, info[2]))
243 # END for each obj
--> 244 raise KeyError(msg % file)
245 # END handle long paths
246
KeyError: "Blob or Tree named 'file1.txt^0' not found"
rev_parse() can successfully resolve the <ref>:<path> notation to a Blob. Can that be used by nbdime?
rev_parse()can successfully resolve the<ref>:<path>notation to aBlob. Can that be used by nbdime?
Sounds like that is what we want. The logic for where we currently use commit would need to be updated to support all the return types from rev_parse though: https://gitpython.readthedocs.io/en/stable/reference.html#git.repo.base.Repo.rev_parse
For the record, while upstream is deciding what to do with this, I am using the following workaround:
#!/usr/bin/env python
import sys
import subprocess
if len(sys.argv) == 10:
old_filename, old_file, old_hex, old_mode, new_file, new_hex, new_mode, new_filename, wtf = sys.argv[1:]
subprocess.check_call([
'nbdiff',
old_file,
new_file,
])
else:
assert False
and
env GIT_EXTERNAL_DIFF=./wrap_nbdiff.py git diff ...
It has a weird issue: before showing the actual diff, the following error is printed multiple times:
fatal: cannot run ./wrap_nbdiff.py: No such file or directory
fatal: external diff died, stopping at before
but it works, at least.
(nbdiff can be also replaced with nbdiff-web, which does not trigger this weird error and also looks better, but that's not exactly an equivalent for git diff)
Jeff King writes:
The last two are:
when the destination path differs from the source path, we append it. This is generally a sign of a rename/copy, though it triggers in the blob case because Git has been manually given a pair of files with non-matching names.
the final one is metainfo that Git typically prints between the "diff" header and the diff itself. This is added only when we add the filename, and would generally carry information about the rename (but of course since there isn't one, it has only the index line).
These were both added by 427dcb4bca ([PATCH] Diff overhaul, adding half of copy detection., 2005-05-21). I think the intent was that existing diff commands would/could ignore the extra arguments.
Certainly the world would be a better place if those were described in the external-diff documentation (specifically in relation to renames, which is their intended use).
I guess this can be used as reference for git-nbdiffdriver CLI args.
~Seems reasonable, although I would also hope to get an explanation for the arguments passed when using .., and how that overlaps with everything else.~
Never mind the last comment :). Do you want to have a go at fixing this?
Huh. Strangely, it seems that not only the rename_to arg is present in the latest version, but it has been there since the inception (and you were the author of that first commit).
I confirm that the addition of a single line
diff_parser.add_argument('rename_metadata', type=Path, nargs='?', default=None, action=SkipAction)
in the add_git_diff_driver_args() function resolves the
usage: git-nbdiffdriver [-h] [--version] [--config] [--log-level {DEBUG,INFO,WARN,ERROR,CRITICAL}] {diff,webdiff,config} ...
git-nbdiffdriver: error: unrecognized arguments: index 6681040..b9a8a11 100644
error for me.
#498 should fix one of the issues discussed here. I'm not yet sure that I want to take a stab at integrating support for diffing blobs into nbdiff.
I'm not yet sure that I want to take a stab at integrating support for diffing blobs
That's ok, but be aware that my TODO pile is quite high at the moment 😞
In the meantime, it seems that the following alias in ~/.gitconfig is a workaround for the nbdiff-web issue:
[alias]
jdiff = "!f() { GIT_EXTERNAL_DIFF='bash -c \"nbdiff-web $2 $5\"' git diff \"$@\"; }; f"