python-bibtexparser icon indicating copy to clipboard operation
python-bibtexparser copied to clipboard

URLs need more corner case coverage

Open ldzzz opened this issue 3 years ago • 2 comments

Hi there,

while testing the library if it fits my needs, I've come across some issues in how urls and links in general are handled.

An example that is not uncommon within use of howpublished field:

@misc{test01,
 address = {Massachusetts PA.XDD\_1\_2\_3\%@36},
 author = {Leslie Lamport},
 edition = {2},
 howpublished = {\url{https://github.com/sciunto-org/python-bibtexparser}}, % or  howpublished = "\url{https://github.com/sciunto-org/python-bibtexparser}",
 publisher = {Addison Wesley},
 title = {{L}aTe{X} Document {P}reparation {S}ystem},
 year = {1994}
}

Using the following code to convert:

bibparser = BibTexParser(common_strings=True)
bibparser.customization = homogenize_latex_encoding
bib_data = bibtexparser.loads(form_bibtex, parser=bibparser)
cleaned_bib_data = bibtexparser.dumps(bib_data)

howpublished field in this case gets converted to something like this howpublished = {r\ulhttps://github.com/sciunto-org/python-bibtexparser},, when in essence it should remain the same as url is already handled by the hyperref lib command.

Furthermore is there any way to handle non-standardized fields like url={}, and exclude them from parsing/character escaping methods, as such field does not need any escapes to be displayed correctly using bibtex (or natbib according to the Internet). (Same issue occurs as up, special chars are replaced, when url = {} field is able to handle urls on its own)

Cheers :)

ldzzz avatar Aug 05 '22 09:08 ldzzz

Would you mind posting your exact code, including customizations?

MiWeiss avatar Aug 16 '22 12:08 MiWeiss

Hi, yes the code I use is following (updated original description as well)

bibparser = BibTexParser(common_strings=True)
bibparser.customization = homogenize_latex_encoding
bib_data = bibtexparser.loads(form_bibtex, parser=bibparser)
cleaned_bib_data = bibtexparser.dumps(bib_data)

ldzzz avatar Sep 02 '22 10:09 ldzzz

This is fixed in v2, where the "\url" part is simply removed when doing latex decoding.

MiWeiss avatar May 26 '23 14:05 MiWeiss