pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

Inconsistent Handling of Language Specifier for `djot`

Open mobotsar opened this issue 7 months ago • 1 comments

The issue is with processing djot (.dj) files. Output of language specifiers on code blocks is incorrect.

The command I use is pandoc -r djot -w html in.dj.

As a minimal example, I have a file called in.dj with the following contents.

```ocaml
let z = let x, y = 2, 3 in x + y ;;

let () = print_int z ;;
```

Note the ocaml specifier. Running the command above, I get

<div class="sourceCode" id="cb1"><pre
class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> z = <span class="kw">let</span> x, y = <span class="dv">2</span>, <span class="dv">3</span> <span class="kw">in</span> x + y ;;</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">let</span> () = <span class="dt">print_int</span> z ;;</span></code></pre></div>

Okay, that's wordy, but basically fine output; looks like it interpreted the language specifier as class="sourceCode ocaml" and pasted that around.

If I change the language, however, replacing ocaml in the above djot file with lean4, the output changes dramatically, to this

<pre class="lean4"><code>let z = let x, y = 2, 3 in x + y ;;

let () = print_int z ;;
</code></pre>

Looks like it didn't recognize the language (why would it be trying to do that in the first place..?) and so totally skipped the div, the "sourceCode" class, and all that stuf with <span> and aria-hidden. I checked this by trying java and c and then skjgnihenihnskldgklagherugnkrl and ajk8979Hugnyg; the first two gave me the div/span stuff and the second two behaved like with lean4. It seems clear to me that I should be getting class="sourceCode lean4" and whatnot: the djot reference implementation does not make any distinctions between language specifiers.

Anyway, this inconsistency is breaking my post-processing for static syntax highlighting.

I'm using the latest version of pandoc, downloaded from the "releases" section of this page just a couple days ago, on Fedora 42.

mobotsar avatar Jun 14 '25 01:06 mobotsar

I'm happy to fix this if someone will point me in the right direction, btw. Or to wait on a fix--- whichever is all-around easier.

mobotsar avatar Jun 14 '25 01:06 mobotsar

Pandoc has its own source-code highlighting (and no support for lean4 at the moment). If you are doing your own highlighting, then just use --no-highlight and you'll get consistent output similar to your lean4 example above.

jgm avatar Jun 16 '25 18:06 jgm