parsebib
parsebib copied to clipboard
Unmatched full-width braces lead to "Unbalanced parentheses" errors
(I am using Spacemacs and have tons of customization, but I assume this is irrelevant? If the following is not enough to reproduce the issue, I will try again in a fresh install / vanilla emacs.)
Reproducing Steps
% Failing.bib
@article{Title,
title = {{Title}},
author = {Author},
year = {1970},
journal = {Journal},
abstract = {(} % <-- Culprit
}
% Passing.bib
@article{Title,
title = {{Title}},
author = {Author},
year = {1970},
journal = {Journal},
abstract = {()} % <-- No error
}
Evaluation results:
(parsebib-parse "/tmp/Passing.bib")
#s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8125 data ("Title" (("abstract" . "()<–Noerror") ("journal" . "Journal") ("year" . "1970") ("author" . "Author") ("title" . "Title") ("=type=" . "article") ("=key=" . "Title"))))
(parsebib-parse "/tmp/Failing.bib")
Debugger entered--Lisp error: (scan-error "Unbalanced parentheses" 23 145)
scan-sexps(23 1)
forward-sexp(1)
parsebib--match-brace-forward()
parsebib--match-paren-forward()
parsebib-read-entry("article" nil #<hash-table equal 0/65 0x15659cd2c74d> nil t)
parsebib-parse-bib-buffer(:entries #<hash-table equal 0/65 0x15659cd2c72d> :strings #<hash-table equal 0/65 0x15659cd2c74d> :expand-strings t :inheritance t :fields nil :replace-TeX t)
#f(compiled-function (file) #<bytecode -0x424eb6921cb642f>)("/tmp/Failing.bib")
parsebib-parse("/tmp/Failing.bib")
(progn (parsebib-parse "/tmp/Failing.bib"))
elisp--eval-last-sexp(t)
#<subr eval-last-sexp>(t)
#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>)()
eval-sexp-fu-flash-doit-simple(#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>) #f(compiled-function (&rest args2) #<bytecode 0x68db0840b9799c7>) #f(compiled-function (&rest args2) #<bytecode 0x6821c372fab19c7>))
eval-sexp-fu-flash-doit(#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>) #f(compiled-function (&rest args2) #<bytecode 0x68db0840b9799c7>) #f(compiled-function (&rest args2) #<bytecode 0x6821c372fab19c7>))
esf-flash-doit(#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>) #f(compiled-function (&rest args2) #<bytecode 0x68db0840b9799c7>) #f(compiled-function (&rest args2) #<bytecode 0x6821c372fab19c7>) #f(compiled-function (&rest args2) #<bytecode 0xa28255960f219d0>))
ad-Advice-eval-last-sexp(#<subr eval-last-sexp> t)
apply(ad-Advice-eval-last-sexp #<subr eval-last-sexp> t)
eval-last-sexp(t)
eval-print-last-sexp(nil)
funcall-interactively(eval-print-last-sexp nil)
command-execute(eval-print-last-sexp)
Expecting behavior
The parser should treat full-width characters as normal text instead of syntactic elements.
P.S. Both Failing.bib
and Passing.bib
pass validation by biber
(via biber --tool -V Failing.bib
/ biber --tool -V Passing.bib
).
This is probably the result of parsebib using forward-sexp
to find the end of a BibTeX entry, but I'll need to look into it before I can say for sure.
Oh, wait a sec. This is not a normal opening parenthesis, it's a CJK character! I hadn't noticed that right away.
You'll notice that if you have an unclosed ASCII parenthesis, it actually works.
This may actually be a bug in Emacs (bibtex.el
, to be more precise): parsebib uses the syntax table bibtex-braced-string-syntax-table
to during parsing, which turns parentheses ()
into normal punctuation instead of characters that need to be in pairs, which allows it to ignore any unmatched parentheses in field values. However, the (CJK) fullwidth parentheses don't have their syntax class set to punctuation, so parsebib tries to match them, which in cannot.
So arguably, bibtex-braced-string-syntax-table
should deal with non-ASCII parentheses as well, because bibtex-mode
can't handle them either. If I open Failing.bib
in Emacs and then do C-c C-c
, I get the error user-error: Syntactically incorrect BibTeX entry starts here
.
I've created an Emacs bug here: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=68477
In the mean time, you should be able to add FULLWIDTH LEFT PARENTHESIS and FULLWIDTH RIGHT PARENTHESIS to bibtex-braced-string-syntax-table
yourself:
(with-eval-after-load 'bibtex
(modify-syntax-entry ?\( "." bibtex-braced-string-syntax-table)
(modify-syntax-entry ?\) "." bibtex-braced-string-syntax-table))
I suspect there will be more non-ASCII parentheses that would need to be added to bibtex-braced-string-syntax-table
, so if any of those are problematic for you as well, you can add them in the same way.
This is only a work-around, of course, but it should help you deal with your issue until a new Emacs version is released with the fix.