python-bibtexparser
python-bibtexparser copied to clipboard
Review homogeneize_latex_encoding()
homogeneize_latex_encoding
has several issues:
- Protect accent should act only on certain fields
- we should check carefully that accent are correctly coded
I have an issue with homogeneize_latex_encoding().
One of my article is called something_2010
and it get escaped as something\_2010
using homogeneize_latex_encoding. But such an ident is not working in Latex (error on the \cite line).
Did I miss something ?
No, you don't. It's because homogeneize_latex_encoding() does not distinguish data and metadata (point 1). I'm working on a partial fix.
Another question about this.
I'm using a custom Bibtex field called "file" to store filenames to pdf files of my articles. If I use homogeneize_latex_encoding(), the special characters (such as _) in my filename get escaped and I can't use it in my python scripts.
However, I'd rather have an homogeneous latex encoding in the bibtex file I write.
So, maybe it could be a good idea to implement also a unhomogeneize_latex_encoding function, or (easier I think), to be able to specify the customization at writing time.
What do you think of this ?
I do not think it's a good way to go to write a unhomogeneize_latex_encoding function. It would not be natural to do so from the user point of view.
However, homogeneize_latex_encoding() must have an extra optional argument, a dict, to specify either:
- the record treated by string_to_latex (title, author, abstract)
- or NOT treated.
For now, I'm in favor of the solution 1, probably shorter and a default list might work for a broader range of usecases.
I agree with the unhomogeneize_latex_encoding function.
But maybe homogeneize_latex_encoding could be called when writing with bwriter ? This is possible for now, by explicitly calling it before writing. But I think maybe the writing functions could have a customizations param just as the reader have ?
And I'm in favor of an extra optional argument with default fields to treat as well.
OK for the extra arg in homogeneize_latex_encoding.
I do not understand why you need a customization callback in bwriter functions? What prevents you to pass your customization functions in the parser itself?
Actually, yes I can pass it myself. I was just thinking that a symmetric behaviour of the reader and writer with the customization option could be nice. But it may be a stupid idea…
Issue is 10 years old, much has changed since. Should similar problems still pop up, it would probably be best to open a new issue.