rootstock icon indicating copy to clipboard operation
rootstock copied to clipboard

JSON reference data containing square brackets mistakenly produces Mathjax?

Open scokobro opened this issue 5 years ago • 7 comments

First of all, thanks for manubot, am looking forward to seeing it develop!

After following the setup instructions and getting my first manubot paper up and running --- this went very smoothly --- I encountered an issue where some citations containing square brackets are rendered as Mathjax (I think).

My reference data is in the manual-references.json file and I am citing using standard pandoc citations. I am using an APA author-year style CSL file. The JSON used is the output from the pandoc-citeproc command (converted from bibtex).

The reference data that leads to the odd-looking citation is...

{   "id": "nhknenkan43",
    "issued": {
      "date-parts": [
        [
          1943
        ]
      ]
    },
    "publisher": "NHK",
    "publisher-place": "Tokyo",
    "title": "Shōwa 18-nen rajio nenkan \\[1943 radio yearbook\\]",
    "type": "book"
  }

As you can see, this work has no author so the default seems to be to use the 'title' field instead, this means the in-text citation ends up rendered like this...

image

Maybe the \\ is being interpreted somewhere as a hard line break? Anyway, the produced HTML for this looks like this...

image

I realise that the main constituency for manubot is probably not social science people like me, but you never know !? I should mention that I use pandoc, and the same bib data, to generate my documents locally and do not see this.

scokobro avatar Sep 15 '20 05:09 scokobro

@scokobro thanks for reporting this and trying Manubot. We are interested in supporting the social sciences.

Do you have this example in a public manuscript GitHub repository? I'd like to try to reproduce it locally.

agitter avatar Sep 21 '20 03:09 agitter

Hi @gitter - thanks for getting back on this. The repository is - https://github.com/scokobro/RT-manubot

scokobro avatar Sep 21 '20 10:09 scokobro

If it's of any use, this is the original bibtex, before conversion to JSON...

@book{nhknenkan43,
	Address = {Tokyo},
	Booktitle = {NHK Radio Yearbook 1943},
	Editor = {{NHK}},
	Publisher = {NHK},
	Short = {Nenkan 1943},
	Title = {Shōwa 18-nen Rajio Nenkan [1943 Radio Yearbook]},
	Year = {1943}}

scokobro avatar Sep 22 '20 13:09 scokobro

I have a couple ideas.

  1. Disable MathJax for your manuscript. This is easy but only a viable option if you do not need MathJax elsewhere in your manuscript. In my local testing, I edited build/pandoc/defaults/html.yaml
# Pandoc --defaults for HTML output.
# Load on top of common defaults.
to: html5
output-file: output/manuscript.html
include-after-body:
- build/themes/default.html
- build/plugins/anchors.html
- build/plugins/accordion.html
- build/plugins/tooltips.html
- build/plugins/jump-to-first.html
- build/plugins/link-highlight.html
- build/plugins/table-of-contents.html
- build/plugins/lightbox.html
- build/plugins/attributes.html
#- build/plugins/math.html
- build/plugins/hypothesis.html
- build/plugins/analytics.html
#variables:
#  math: ''
#html-math-method:
#  method: mathjax

and build/pandoc/defaults/pdf-weasyprint.yaml (may not be necessary):

# Pandoc --defaults for PDF output via weasyprint.
# Load on top of HTML defaults.
output-file: output/manuscript.pdf
pdf-engine: weasyprint
pdf-engine-opts:
- '--presentational-hints'
#html-math-method:
#  method: webtex
#  url: 'https://latex.codecogs.com/svg.latex?'

Commenting out these HTML math related lines give me the following in the HTML version of a local build: image

It's not perfect because the escaped brackets are still there. However, it's unlikely to break anything else.

  1. Edit the output HTML in the build script. After these lines in build/build.sh https://github.com/manubot/rootstock/blob/0964fd7ed7fdc1e3b468b5f4f1234d24de0b14fc/build/build.sh#L30-L36

add the following sed commands to convert \[ to [:

sed -i 's/\\\[/[/g' output/manuscript.html
sed -i 's/\\\]/]/g' output/manuscript.html

This looks nice in my local build: image and image

However, there is a risk that this blunt replacement could break something else. You would likely be able to detect this in the HTML. I think it will fix the PDF version too when the PDF manuscript is built with Docker, as it is on GitHub. I don't think the WeasyPrint version would be fixed.

  1. Stripping the \\[ from the converted JSON. This does not seem to work. I tested the following with the command manubot cite --render url:test --bibliography=test2.json --format=markdown but Pandoc adds back the \[ when writing to markdown.:
[
  {
    "editor": [
      {
        "literal": "NHK"
      }
    ],
    "id": "url:test",
    "issued": {
      "date-parts": [
        [
          1943
        ]
      ]
    },
    "publisher": "NHK",
    "publisher-place": "Tokyo",
    "title": "Shōwa 18-nen rajio nenkan [1943 radio yearbook]",
    "type": "book"
  }
]

agitter avatar Sep 24 '20 21:09 agitter

The following seems like the correct title for the CSL JSON:

    "title": "Shōwa 18-nen rajio nenkan [1943 radio yearbook]",

So is the underlying issue that pandoc is prefixing brackets with blackslashes in titles, causing mathjax to incorrectly identify equations that should just be text. @jgm any idea whether this is a bug we should report to pandoc or pandoc-citeproc, or whether there is a solution on the user's end that is not disabling mathjax entirely?

@scokobro, a dirty but quick solution might just be renaming the title on your end to "Shōwa 18-nen rajio nenkan: 1943 radio yearbook". But we'd love to get this fixed in the longterm.

dhimmel avatar Sep 26 '20 14:09 dhimmel

Well, I'm happy to report that (a) pandoc is transitioning to built-in citeproc support, so that pandoc-citeproc will no longer be needed; (b) the dev version of pandoc produces this with pandoc -f bibtex -t csljson:

[
  {
    "editor": [
      {
        "literal": "NHK"
      }
    ],
    "id": "nhknenkan43",
    "issued": {
      "date-parts": [
        [
          1943
        ]
      ]
    },
    "publisher": "NHK",
    "publisher-place": "Tokyo",
    "title": "Shōwa 18-nen rajio nenkan [1943 radio yearbook]",
    "title-short": "[CSL STYLE ERROR: reference with no printed form.]",
    "type": "book"
  }
]

(There's clearly an issue here involving title-short, which I'll have to look at, but at least the brackets are no longer escaped.) You can get a nightly from the jgm/pandoc repository (under Actions) if you want to test further.

jgm avatar Sep 26 '20 21:09 jgm

Thanks to everyone for your advice: The Chicago Manual of Style requires citations as footnotes with short titles, and that translations of titles of non-english (or 'other language') material appears in square-brackets. I am relieved to say that for me this is a passing issue (as I generally wouldn't write for a history journal) but it will be important for anyone using translated reference materials and required to use CMS styles. In the meantime I'm happy to disable mathjax. Thanks again, and good luck with everything!

scokobro avatar Sep 28 '20 01:09 scokobro