pandoc Org-mode reader: `#+pandoc-emphasis-pre` doesn't work as expected

Explain the problem. Adding characters to #+pandoc-emphasis-pre as described in the manual doesn't work as expected. Interestingly, adding to #+pandoc-emphasis-post does.

Minimal working example test.org:

#+pandoc-emphasis-pre: "-\t ('\"{T"
#+pandoc-emphasis-post: "-\t\n .,:!?;'\")}[t"

1. T/est/ with T allowed as pre
2. /Tes/t with t allowed as post
3. Normal /emphasis/, and in {/brackets/}

Command: pandoc -o test.md test.org

Expected test.md result:

1.  T*est* with T allowed as pre
2.  *Tes*t with t allowed as post
3.  Normal *emphasis*, and in {*brackets*}

Actual result:

1.  T/est/ with T allowed as pre
2.  *Tes*t with t allowed as post
3.  Normal *emphasis*, and in {*brackets*}

Exporting to Pandoc AST confirms that the problem is with the reader.

Also completely replacing the strings with "T" and "t" respectively achieves the same result.

Pandoc version? Pandoc 2.18 on Manjaro Linux (pandoc-2.18-linux-amd64.tar.gz from the release page). Also happens on https://pandoc.org/try

May 07 '22 16:05 adql

The relevant test in tests/Tests/Readers/Org/Meta.hs

[ "Changing pre and post chars for emphasis" =:
  T.unlines [ "#+pandoc-emphasis-pre: \"[)\""
              , "#+pandoc-emphasis-post: \"]\\n\""
              , "([/emph/])*foo*"
              ] =?>
  para ("([" <> emph "emph" <> "])" <> strong "foo")

, which tests adding the non-standard [ to pre, passes flawlessly. Possibly the bug occurs on alphanumeric chars (?) – I tried manipulating the test with T, t, and 3, all fail.

Jun 18 '22 19:06 adql

I tried more cases and it seems like this is more general that I first thought. The only characters I managed so far to have as pre are various parentheses, $ and +. Alphanumeric chars, !, %, # all fail. Probably others as well.

Jun 20 '22 10:06 adql

Related issue: #6070

Jun 20 '22 10:06 tarleb

Tracing for orgStateEmphasisPreChars in the parser state shows that it updates as expected after #+pandoc-emphasis-pre:, i.e. also with chars that fail to become allowed before emphasis. The problem must be in the parsing of the emphasis itself.

Jun 23 '22 11:06 adql