python-bibtexparser icon indicating copy to clipboard operation
python-bibtexparser copied to clipboard

Unexpected concatenation of field tokens

Open zepinglee opened this issue 2 years ago • 2 comments

Describe the bug

In the following example, the value of field 10 # "~" # jan is expected to be 10~Jan. but the output of this library is 10 # "~".

BibTeX has three types of field tokens: nonnegative number, macro name (like jan), and a brace-balanced string delimited by either double quotes or braces. They can be concatenated by the # character. Although the first type is called "number", it behaves the same as a string and it can be applied with string slicing, text length, and concatenation in a .bst style.

BTW, I've also made a bib2json.bst style that may help testing. It reads .bib data and writes JSON format (though with some limitations) to the .bbl output.

Reproducing

Version: 2.0.0b2

Code:

import bibtexparser
bibtex_str = '''
@STRING{ jan = "Jan." }

@INBOOK{inbook-full,
   month = 10 # "~" # jan,
}
'''
library = bibtexparser.parse_string(bibtex_str)
month = library.entries[0].fields_dict['month'].value
print(month.__repr__())
assert month == "10~Jan."

Output:

'10 # "~"'

zepinglee avatar Sep 15 '23 15:09 zepinglee

Thanks a lot for the beautiful bug report. This will probably have to be adressed in two distinct PRs

  • [x] One PR to fix the splitter to contain the entire field, even if the field contains string concatenations.
  • [ ] One follow-up PR to adapt StringInterpolationMiddleware (and probably add a further middleware) to properly handle concatenation.

The first of these PRs is likely nontrivial.

P.s. I have not actually reproduced the issue, but given the nice issue description and the fact that token concatenation is not yet supported, I still added the reproduced label.

MiWeiss avatar Sep 15 '23 16:09 MiWeiss

Note that string concatenation can also be used inside @string, and I've seen this in cryptobib. An example is:

@string{asiacryptname =         "ASIACRYPT"}
@string{asiacrypt91name =       asiacryptname # "'91"}
@string{asiacrypt92name =       auscryptname # "'92"}

kmccurley avatar Jun 21 '24 04:06 kmccurley