citeproc-py icon indicating copy to clipboard operation
citeproc-py copied to clipboard

bibutils macros are not all supported by BibTex parser

Open jayvdb opened this issue 8 years ago • 1 comments

bibutils is a commonly used converter, which can help citeproc-py users as it assists converting records to bib for importing into citeproc-py.

I've created an archive of bibutils releases at https://github.com/jayvdb/bibutils-archive , so it is easier to navigate and link to specifics.

bibutils latex macro/encoding mapping is at https://github.com/jayvdb/bibutils-archive/blob/master/lib/latex.c . It is only accessed by the functions latex2char and uni2latex. It always emits the bib1 member of the struct for each unicode characer.

It would be good to ensure that all latex sequences it emits are accepted by citeproc-py's BibTex parser.

The following is a diff of supported macros after a quick cleanup and comparison, using only the macros matching ^[a-zA-Z]*$ (i.e. quickly excluding any of the more complex macros)

--- bibutils
+++citeproc
 aa
 AA
 ae
 AE
+b
+c
+copyright
+d
+dag
+ddag
+dh
+DH
 dj
 DJ
-emspace
-enspace
+G
+guillemotleft
+guillemotright
+guilsinglleft
+guilsinglright
+H
 i
+k
 l
 L
-ldots
+ng
+NG
 o
 O
 oe
 OE
+P
+pounds
+quotedblbase
+quotesinglbase
+r
+S
 ss
-textacutedbl
+t
+TeX
-textasciiacute
-textasciiacutex
 textasciicircum
-textasciigrave
 textasciitilde
-textbaht
-textbardbl
-textbrokenbar
+textasteriskcentered
+textbackslash
+textbar
+textbraceleft
+textbraceright
 textbullet
-textcelcius
-textcent
-textcircledP
+textcircled
 textcopyright
 textdagger
 textdaggerdbl
-textdegree
-textdiv
-textdong
-textdownarrow
-textestimated
-texteuro
+textdollar
+textellipsis
+textemdash
+textendash
 textexclamdown
-textflorin
-textfractionsolidus
-textfrenchfranc
-textlangle
-textleftarrow
-textlira
-textlnot
-textlquill
-textmho
-textmu
-textnaira
-textnospace
-textnumero
-textohm
-textonehalf
-textonequarter
-textonesuperior
-textopenbullet
+textgreater
+textless
 textordfeminine
 textordmasculine
 textparagraph
 textperiodcentered
-textpertenthousand
-textpm
 textquestiondown
-textrangle
+textquotedbl
+textquotedblleft
+textquotedblright
+textquoteleft
+textquoteright
 textregistered
-textrightarrow
-textrquill
 textsection
-textservicemark
 textsterling
-textsurd
-texttenthousand
-textthreequarters
-textthreesuperior
-texttimes
 texttrademark
-texttwosuperior
-textuparrow
+textunderscore
 textvisiblespace
-textwon
-textyen
+th
+TH
-thinspace
+u
+U
+v

Are there any of these that are not suitable for citeproc-py's BibTeX parser?

If there is general support for keeping in sync with at least a subset of bibutils latex output, we could write a test that parses lib/latex.c to ensure the defined subset is supported.

jayvdb avatar Feb 15 '16 07:02 jayvdb

I had a quick look at the list, and I think most of them simply map to unicode characters.

brechtm avatar Feb 17 '16 20:02 brechtm