parsebib icon indicating copy to clipboard operation
parsebib copied to clipboard

Unmatched full-width braces lead to "Unbalanced parentheses" errors

Open gudzpoz opened this issue 1 year ago • 3 comments

(I am using Spacemacs and have tons of customization, but I assume this is irrelevant? If the following is not enough to reproduce the issue, I will try again in a fresh install / vanilla emacs.)

Reproducing Steps

% Failing.bib
@article{Title,
  title = {{Title}},
  author = {Author},
  year = {1970},
  journal = {Journal},
  abstract = {(} % <-- Culprit
}
% Passing.bib
@article{Title,
  title = {{Title}},
  author = {Author},
  year = {1970},
  journal = {Journal},
  abstract = {()} % <-- No error
}

Evaluation results:

(parsebib-parse "/tmp/Passing.bib")
#s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8125 data ("Title" (("abstract" . "()<–Noerror") ("journal" . "Journal") ("year" . "1970") ("author" . "Author") ("title" . "Title") ("=type=" . "article") ("=key=" . "Title"))))

(parsebib-parse "/tmp/Failing.bib")
Debugger entered--Lisp error: (scan-error "Unbalanced parentheses" 23 145)
  scan-sexps(23 1)
  forward-sexp(1)
  parsebib--match-brace-forward()
  parsebib--match-paren-forward()
  parsebib-read-entry("article" nil #<hash-table equal 0/65 0x15659cd2c74d> nil t)
  parsebib-parse-bib-buffer(:entries #<hash-table equal 0/65 0x15659cd2c72d> :strings #<hash-table equal 0/65 0x15659cd2c74d> :expand-strings t :inheritance t :fields nil :replace-TeX t)
  #f(compiled-function (file) #<bytecode -0x424eb6921cb642f>)("/tmp/Failing.bib")
  parsebib-parse("/tmp/Failing.bib")
  (progn (parsebib-parse "/tmp/Failing.bib"))
  elisp--eval-last-sexp(t)
  #<subr eval-last-sexp>(t)
  #f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>)()
  eval-sexp-fu-flash-doit-simple(#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>) #f(compiled-function (&rest args2) #<bytecode 0x68db0840b9799c7>) #f(compiled-function (&rest args2) #<bytecode 0x6821c372fab19c7>))
  eval-sexp-fu-flash-doit(#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>) #f(compiled-function (&rest args2) #<bytecode 0x68db0840b9799c7>) #f(compiled-function (&rest args2) #<bytecode 0x6821c372fab19c7>))
  esf-flash-doit(#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>) #f(compiled-function (&rest args2) #<bytecode 0x68db0840b9799c7>) #f(compiled-function (&rest args2) #<bytecode 0x6821c372fab19c7>) #f(compiled-function (&rest args2) #<bytecode 0xa28255960f219d0>))
  ad-Advice-eval-last-sexp(#<subr eval-last-sexp> t)
  apply(ad-Advice-eval-last-sexp #<subr eval-last-sexp> t)
  eval-last-sexp(t)
  eval-print-last-sexp(nil)
  funcall-interactively(eval-print-last-sexp nil)
  command-execute(eval-print-last-sexp)

Expecting behavior

The parser should treat full-width characters as normal text instead of syntactic elements.

P.S. Both Failing.bib and Passing.bib pass validation by biber (via biber --tool -V Failing.bib / biber --tool -V Passing.bib).

gudzpoz avatar Jan 14 '24 07:01 gudzpoz

This is probably the result of parsebib using forward-sexp to find the end of a BibTeX entry, but I'll need to look into it before I can say for sure.

joostkremers avatar Jan 15 '24 09:01 joostkremers

Oh, wait a sec. This is not a normal opening parenthesis, it's a CJK character! I hadn't noticed that right away.

You'll notice that if you have an unclosed ASCII parenthesis, it actually works.

This may actually be a bug in Emacs (bibtex.el, to be more precise): parsebib uses the syntax table bibtex-braced-string-syntax-table to during parsing, which turns parentheses () into normal punctuation instead of characters that need to be in pairs, which allows it to ignore any unmatched parentheses in field values. However, the (CJK) fullwidth parentheses don't have their syntax class set to punctuation, so parsebib tries to match them, which in cannot.

So arguably, bibtex-braced-string-syntax-table should deal with non-ASCII parentheses as well, because bibtex-mode can't handle them either. If I open Failing.bib in Emacs and then do C-c C-c, I get the error user-error: Syntactically incorrect BibTeX entry starts here.

joostkremers avatar Jan 15 '24 15:01 joostkremers

I've created an Emacs bug here: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=68477

In the mean time, you should be able to add FULLWIDTH LEFT PARENTHESIS and FULLWIDTH RIGHT PARENTHESIS to bibtex-braced-string-syntax-table yourself:

(with-eval-after-load 'bibtex
  (modify-syntax-entry ?\( "." bibtex-braced-string-syntax-table)
  (modify-syntax-entry ?\) "." bibtex-braced-string-syntax-table))

I suspect there will be more non-ASCII parentheses that would need to be added to bibtex-braced-string-syntax-table, so if any of those are problematic for you as well, you can add them in the same way.

This is only a work-around, of course, but it should help you deal with your issue until a new Emacs version is released with the fix.

joostkremers avatar Jan 15 '24 17:01 joostkremers