Unexpected concatenation of field tokens
Describe the bug
In the following example, the value of field 10 # "~" # jan is expected to be 10~Jan. but the output of this library is 10 # "~".
BibTeX has three types of field tokens: nonnegative number, macro name (like jan), and a brace-balanced string delimited by either double quotes or braces. They can be concatenated by the # character. Although the first type is called "number", it behaves the same as a string and it can be applied with string slicing, text length, and concatenation in a .bst style.
BTW, I've also made a bib2json.bst style that may help testing. It reads .bib data and writes JSON format (though with some limitations) to the .bbl output.
Reproducing
Version: 2.0.0b2
Code:
import bibtexparser
bibtex_str = '''
@STRING{ jan = "Jan." }
@INBOOK{inbook-full,
month = 10 # "~" # jan,
}
'''
library = bibtexparser.parse_string(bibtex_str)
month = library.entries[0].fields_dict['month'].value
print(month.__repr__())
assert month == "10~Jan."
Output:
'10 # "~"'
Thanks a lot for the beautiful bug report. This will probably have to be adressed in two distinct PRs
- [x] One PR to fix the splitter to contain the entire field, even if the field contains string concatenations.
- [ ] One follow-up PR to adapt StringInterpolationMiddleware (and probably add a further middleware) to properly handle concatenation.
The first of these PRs is likely nontrivial.
P.s. I have not actually reproduced the issue, but given the nice issue description and the fact that token concatenation is not yet supported, I still added the reproduced label.
Note that string concatenation can also be used inside @string, and I've seen this in cryptobib. An example is:
@string{asiacryptname = "ASIACRYPT"}
@string{asiacrypt91name = asiacryptname # "'91"}
@string{asiacrypt92name = auscryptname # "'92"}