python-bibtexparser
python-bibtexparser copied to clipboard
Error in latex_to_unicode
Describe the bug
The latex_to_unicode function can fail with a rather obsure type error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/dorian/Programs/github/bibtex-autocomplete/venv/lib/python3.11/site-packages/bibtexparser/latexenc.py", line 65, in latex_to_unicode
string = _replace_all_latex(string, itertools.chain(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dorian/Programs/github/bibtex-autocomplete/venv/lib/python3.11/site-packages/bibtexparser/latexenc.py", line 53, in _replace_all_latex
string = _replace_latex(string, l.rstrip(), u)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dorian/Programs/github/bibtex-autocomplete/venv/lib/python3.11/site-packages/bibtexparser/latexenc.py", line 35, in _replace_latex
if unicodedata.combining(unicod):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: combining() argument must be a unicode character, not str
The problem is most likely due to line like this one, where the encoding isn't a single unicode character: https://github.com/sciunto-org/python-bibtexparser/blob/e4c6eb656f26363eab91510f209a9d7e32942db9/bibtexparser/latexenc.py#L941-L945
(Although this isn't the only example)
Reproducing
Version: 1.4.1
Code:
from bibtexparser.latexenc import latex_to_unicode
latex_to_unicode("\\;")
Remaining Questions (Optional) Please tick all that apply:
- [x] I would be willing to contribute a PR to fix this issue: my solution would be to put a try except block around the call to
unicodedata.combining, assume false if it fails. I haven't submitted this directly because I don't know what these non-unicode characters are and why they are there. If their is a good reason there is probably a better way to handle them, if not they should probably be removed. - [ ] This issue is a blocker, I'd be grateful for an early fix.
Related issue: https://github.com/dlesbre/bibtex-autocomplete/issues/12