DocumenterCitations.jl icon indicating copy to clipboard operation
DocumenterCitations.jl copied to clipboard

Convert common TeX to unicode

Open simonbyrne opened this issue 3 years ago • 13 comments

I set it up here, and noticed a few TeX artifacts.

I would suggest at least converting two dashes (--) to an en-dash () as those are very common and hard to type.

simonbyrne avatar Oct 23 '20 05:10 simonbyrne

PR #17 starts working on this (see below). I'll tag v0.1.1 or v0.2.0 shortly after it's merged.

@simonbyrne Do you know of a list of these kinds of replacements or should we just add to it as we go along?

I was able to find lists for math TeX to unicode (https://github.com/svenkreiss/unicodeit/blob/master/unicodeit/data.py) but not so much for text replacements.

image

ali-ramadhan avatar Oct 23 '20 12:10 ali-ramadhan

I agree we probably don't want the full unicodeit list, as it seems to include both math and text commands

simonbyrne avatar Oct 23 '20 16:10 simonbyrne

Reopening this. We're still seeing issues, e.g. Jo { \~a } o Teixeira, over at the ClimateMachine.jl refs.

charleskawczynski avatar Nov 30 '20 21:11 charleskawczynski

This is due to spurious spaces being inserted by the BibTeX parser. Upstream issue is https://github.com/Azzaare/BibParser.jl/issues/5.

simonbyrne avatar Dec 01 '20 16:12 simonbyrne

Should we leave this open until the upstream is closed?

charleskawczynski avatar Dec 01 '20 16:12 charleskawczynski

Yes, probably a good idea.

simonbyrne avatar Dec 01 '20 17:12 simonbyrne

Hi there! Sorry for the long wait, spurious braces should not be a problem anymore.

It might only be a crude parser that I handcrafted, but BibParser.jl got updated today (v0.1.11) (the new parser should handle any valid BibTeX entry, but do not replace LaTeX commands from a @preamble nor converts LaTeX to Unicode)

Azzaare avatar Apr 16 '21 09:04 Azzaare

I've created a GitHub repo to convert LaTeX ⇋ Unicode: https://github.com/Humans-of-Julia/LaTeXUniCode.jl It is almost empty at the moment, but I will work on it during summer (as I will be in between two jobs, I can have some fun!)

Anyway, if some of you want to join, you're all welcome aboard.

Azzaare avatar Jun 05 '21 12:06 Azzaare

The function tex2unicode is there, but it does not seem to be applied to pages, which is where I see them most often:

pages = {1 -- 45},

Could this be done?

fingolfin avatar Jun 22 '21 10:06 fingolfin

tex2unicoe is currently only applied to title? https://github.com/ali-ramadhan/DocumenterCitations.jl/blob/886bbb740ea2f814ec67e321e61ae16149e58fc2/src/bibliography.jl#L50-L54 Or am i misunderstanding it?

LazyScholar avatar Jun 23 '21 17:06 LazyScholar

No you are right. Hmm, I thought I'd made a PR also applying it to the output of xin... guess I forgot :/

fingolfin avatar Jun 23 '21 18:06 fingolfin

I converted all my .bib files to Unicode therefore i did nor realize that applying it to the authors and maybe published_in might fix it for others.

~~@fingolfin do you want to make the PR (you can delete line 51 as with your last change the year is not needed any more)?~~

LazyScholar avatar Jun 23 '21 18:06 LazyScholar

Reopening this. We're still seeing issues, e.g. Jo { \~a } o Teixeira, over at the ClimateMachine.jl refs.

@charleskawczynski Is { \~a } valid Tex? As far as i know in order to get ã one have to use \~{a} or even \tilde{a} (not sure if the later one is supported by bibtex). Source: https://en.wikibooks.org/wiki/LaTeX/Special_Characters#Escaped_codes

LazyScholar avatar Jun 23 '21 23:06 LazyScholar