python-bibtexparser URLs need more corner case coverage

Hi there,

while testing the library if it fits my needs, I've come across some issues in how urls and links in general are handled.

An example that is not uncommon within use of howpublished field:

@misc{test01,
 address = {Massachusetts PA.XDD\_1\_2\_3\%@36},
 author = {Leslie Lamport},
 edition = {2},
 howpublished = {\url{https://github.com/sciunto-org/python-bibtexparser}}, % or  howpublished = "\url{https://github.com/sciunto-org/python-bibtexparser}",
 publisher = {Addison Wesley},
 title = {{L}aTe{X} Document {P}reparation {S}ystem},
 year = {1994}
}

Using the following code to convert:

bibparser = BibTexParser(common_strings=True)
bibparser.customization = homogenize_latex_encoding
bib_data = bibtexparser.loads(form_bibtex, parser=bibparser)
cleaned_bib_data = bibtexparser.dumps(bib_data)

howpublished field in this case gets converted to something like this howpublished = {r\ulhttps://github.com/sciunto-org/python-bibtexparser},, when in essence it should remain the same as url is already handled by the hyperref lib command.

Furthermore is there any way to handle non-standardized fields like url={}, and exclude them from parsing/character escaping methods, as such field does not need any escapes to be displayed correctly using bibtex (or natbib according to the Internet). (Same issue occurs as up, special chars are replaced, when url = {} field is able to handle urls on its own)

Cheers :)

Aug 05 '22 09:08 ldzzz

Would you mind posting your exact code, including customizations?

Aug 16 '22 12:08 MiWeiss

Hi, yes the code I use is following (updated original description as well)

bibparser = BibTexParser(common_strings=True)
bibparser.customization = homogenize_latex_encoding
bib_data = bibtexparser.loads(form_bibtex, parser=bibparser)
cleaned_bib_data = bibtexparser.dumps(bib_data)

Sep 02 '22 10:09 ldzzz

This is fixed in v2, where the "\url" part is simply removed when doing latex decoding.

May 26 '23 14:05 MiWeiss