obsidian-pandoc
obsidian-pandoc copied to clipboard
Greek and Hebrew drops when exporting to a PDF
I'm using the pandoc plugin in Obsidian that converts the Obsidian markdown to Latex and then to PDF. If I export to docx, it works fine. But when I try to do the Latex --> PDF route, I can only make Everson Mono or Lucida Sans Unicode work (more or less). Any other font does not format the Hebrew. But in a docx export, it looks fine.
My YAML looks like this:
title: Notes on Tabernacle author: Stuart Thiessen lang: en-US documentclass: extarticle mainfont: Everson Mono monofont: Everson Mono toc: true header-includes:
- \usepackage{tcolorbox} toc-title: "Contents" fontsize: 9pt number-sections: true name: Tabernacle
Your problem is more related to LaTeX usage than Pandoc usage, I guess. I have a workflow which supports exporting of hebrew and greek, involving obsidian-pandoc at the .md to .tex stage. For greek and hebrew treatment, I use LuaLaTeX and an automatic detection of greek or hebrew language, which are using their own fonts. If you are interested I may be able to publish a demo somewhere
According to what I've read, you have to use a font that includes Hebrew or Greek characters. The default LaTeX fonts do not include those character sets.
It looks like Limezy found a way to detect Hebrew or Greek and to use an appropriate font only for sections in those languages.
@thiessenstuart on an unrelated and funny note, I attended your presentation at Topeka Bible Church a few weeks ago!
@ParkerRobb Thank you! You are right. The font does need to include Hebrew and Greek characters. I was using the Greek and Hebrew fonts from the Society for Biblical Literature (https://www.sbl-site.org/educational/biblicalfonts.aspx) and from SIL International but somehow they were not coming quite right. But you are right (and I forgot to return here to mention) that it did turn out to be a LaTeX issue. I ended up having to set up YAML in my Obsidian Markdown file to include the necessary parameters that get passed to LaTeX to get it to render the Greek and Hebrew better. :)
Re: TBC ... Wow! Small world. :)
But still Everson Mono is the only font I have found yet that can render it right. So still something to figure out ... sometime.
I ended up having to set up YAML in my Obsidian Markdown file to include the necessary parameters that get passed to LaTeX to get it to render the Greek and Hebrew better. :)
What did you end up putting in the YAML to get it to somewhat work?
For greek and hebrew treatment, I use LuaLaTeX and an automatic detection of greek or hebrew language, which are using their own fonts. If you are interested I may be able to publish a demo somewhere
@Limezy would you mind sharing your workflow?
@ParkerRobb I'm still fine tuning this rather complex workflow. I plan to publish it as a demo vault or long form article for the (hopefully) coming Obsidian October contest !
@thiessenstuart I just found a solution that allows me to write RTL Hebrew text in a LTR line: take a look at the bottom reply at this forum thread. I got it to work by combining a couple things:
- Wrapping the Hebrew text in a
<span lang=he> </span>
HTML span (which is hidden in Obsidian Live Preview). - Using Linux Libertine font for the whole document.
(I did not add any YAML metadata; the only Pandoc options I specified on the command line were --pdf-engine=xelatex -V papersize:letter -V mainfont="Linux Libertine O"
.)
Hope you can glean something helpful for your own situation.
Denis' answer on the above forum page talks about using different fonts for different languages, and I found more information in the Overleaf documentation, but I have yet to experiment with it. You might find both of those to be helpful, @Limezy.
@ParkerRobb thanks for your links !
- About writing RTL Hebrew text in a LTR line I'm not sure to understand what's your problem ? This is allowed by default within Obsidian, without fancy invisible HTML tags ?
- About compilation of such mixed texts, indeed this Overleaf documentation is interesting but things can be made even better with recent versions of Babel and LuaLaTeX, as the font and LTR/RTL orientation can be swapped automatically, depending on the characters. On my current template, I have one font for ancient greek, one font for hebrew, and one main font for the rest. When hebrew characters are detected within the text, LuaLaTeX automatically swaps to RTL and uses the dedicated hebrew font. This automation is really handy, and is something XeLaTeX is not capable of.
- LuaLaTeX is pretty bad at handling hebrew niqqud and cantillation marks, hence a tendency for people to use XeLaTeX instead, but a recent update has made possible the use of Harfbuzz text shaping engine within LuaLaTeX : this is the best of both worlds, with an automatic detection of hebrew characters, and a perfect handling of niqqud and cantillation.
I'll try to publish on this and share my templates shortly
About writing RTL Hebrew text in a LTR line I'm not sure to understand what's your problem ? This is allowed by default within Obsidian, without fancy invisible HTML tags ?
@Limezy It works fine if I only have a single RTL word. But if I have a sequence of RTL words inside a predominantly LTR line, both Obsidian and Pandoc incorrectly reverse the word order (and sometimes put the niqqud under the wrong letters).
Can you please copy here a minimal non-working example ? I'm interested as this seems to be quite unfortunate... For obsidian I guess it should be a filed as a prosemirror bug ? For pandoc, do you mean that rtl sections are reversed in the .tex output ?
With long enough RTL parts inside LTR lines, I indeed get a strange line return, that is treated as being LTR. But I guess this is normal ? How would the line return be treated differently in that case ? Is that the problem you are talking about @ParkerRobb ?
Obsidian
LuaLaTeX
@Limezy I've attached a few files to demonstrate what I'm talking about: the original Markdown file, and the Pandoc-generated PDF.
RTL in LTR line test.md RTL in LTR line test.pdf
Here's the output of an intermediate LaTeX file, which you'll notice changes the HTML span to a \foreignlanguage
span:
Notice the inline Hebrew words are reversed. The word \emph{b'Ivrit}
should be second when reading from right to left.
\hypertarget{without-html-span}{%
\section{Without HTML span}\label{without-html-span}}
This is some text with טקסט בעברית in the middle.
\hypertarget{with-html-span}{%
\section{With HTML span}\label{with-html-span}}
This is some text with \foreignlanguage{hebrew}{טקסט בעברית} in the
middle.
The word reversal is hard to reproduce in Obsidian. Sometimes Preview Mode switches the inline RTL words, and sometimes it doesn't. 🤷♂️
@ParkerRobb thanks !
As far as I understand, in fact your problem is at LaTeX compilation stage, not exactly Pandoc ? Indeed, generating a pdf with Pandoc is possible but it's only a shortcut where Pandoc will launch a XeLaTeX compilation of a .TeX intermediate file it has been making. Your .TeX file seems good above, and I get exactly the same on my side.
Your problem should be fixable by using a slightly modified LaTeX template for Pandoc and LuaLaTeX as your engine. On my side I prefer to restrict Pandoc to doing his own job, that is making the .TeX using my LaTeX template and using Latexmk for the .TeX to pdf stage, which makes possible multiple compilation runs to use Biber / BibTeX.
Here is the output of your markdown example after copy-pasting it in a document within my standard compilation workflow :
As you can see, you could get rid of these annoying html spans ! I never encountered word reversing so far. I'll soon publish my templates and scripts that you should be able to adapt, either to still use only Pandoc as your pdf compilation engine going through LuaLaTeX, either using a script to start with Pandoc and continue with Latexmk as I'm doing.
As far as I understand, in fact your problem is at LaTeX compilation stage, not exactly Pandoc ?
Your problem should be fixable by using a slightly modified LaTeX template for Pandoc and LuaLaTeX as your engine.
Interesting. I look forward to seeing the details of how you handle it!
Have you ever encountered word reversing in Obsidian, @Limezy?
@ParkerRobb sure I'll send you a link to my latex template and compilation script even though they are not yet polished enough to be sent to a wider audience within the Obsidian community.
No, I have never experienced any Hebrew or RTL word reversing in obsidian, except for the line jump case I have described above. To be very honest though, I'm not good at Hebrew and most of the work I do is to help other members of our Anthropologie biblique group who are the real ones needing to work with and compile Hebrew inside Obsidian. They have never complained about any word reversal so far.
@Limezy have you been able to organize your solution enough to share?
@ParkerRobb here is a first link where you should have all, but still a bit disorganized and probably requiring a bit of work on your side. Theoretically if you use a Mac you can just download the files, install the fonts, setup your Obsidian and it'll work right away https://travaux.anthropologiebiblique.fr/s/obsidian-latex-workflow
Nice, thanks! I'll dig in and see what I can figure out.
Hi @thiessenstuart and @Limezy,
I wanted to share with you guys some things I've figured out and conclusions I've reached regarding our multilingual typesetting.
After doing quite a bit of research (and reporting a bug in Pandoc) in the months since I entered this discussion, I've pieced together an Obsidian and Pandoc configuration and workflow that works great for me. In short, I've decided to continue using the HTML lang
tags for a couple reasons:
- My Obsidian only displays Hebrew correctly consistently, i.e. in the right word order and right-aligned (for blocks), if it's tagged with the language. The HTML tags are hidden in both Reading and Live Preview modes, so their presence isn't a problem for me.
- LaTeX and the Babel package can autodetect the language of content, as you mentioned @Limezy, but not if the languages use the same script. If this is the case, the different languages must be explicitly distinguished via tags. Distinguishing languages that use the same script is important for loading the correct hyphenation patterns.
Your method does work, @Limezy, if every language uses a different script. But especially due to the second point, I decided to pursue a more generalized approach that works for any combination of languages, and is not limited to distinguishing languages by script.
If you want to know more, I have uploaded detailed documentation of my multilingual configuration with plenty of references and footnotes.
Looks like an impressive work @ParkerRobb many thanks, I'll have a look whenever possible