pandoc icon indicating copy to clipboard operation
pandoc copied to clipboard

org-mode Reader/Metadata: abstract

Open michelk opened this issue 2 years ago • 7 comments

In an org-mode file, could we parse

#+BEGIN_ABSTRACT

My abstract

#+END_ABSTRACT

as the metadata field abstract?

Right now, I have a --metadata-file with

---
abstract: |
   My abstract
---

which works, but it would be nicer to have it in one document.

cc @tarleb

michelk avatar Jul 27 '22 07:07 michelk

Seems reasonable, given that org-latex-export-as-latex converts

#+begin_abstract
My abstract
#+end_abstract

to

\begin{abstract}
My abstract
\end{abstract}

Casing seems important though, so I think we should only treat a lowercase abstract environment that way.

Obligatory Lua filter as a temporary workaround:

function Pandoc (doc)
  doc.blocks = doc.blocks:walk{
    Div = function (div)
      -- remove this to get case-sensitive behavior
      --            vvvvvvvvvvvvvvvvvv
      if div.classes:map(string.lower)[1] == 'abstract' then
        doc.meta.abstract = div.content
        return {}
      end
    end
  }
  return doc
end

tarleb avatar Jul 27 '22 10:07 tarleb

On further investigation it appears this is far from a special case (Org documentation section 13.10.10):

For other special blocks in the Org file [including abstract], the LaTeX export back-end makes a special environment of the same name.

So this raises the question: do we really want to add a special case for a block which org-mode itself does not specifically support? As an alternative, I can imagine converting any unknown Org block to a metadata field, similarly to how org-mode converts all unknown blocks to the corresponding LaTeX environments.

bradrn avatar Jul 29 '22 00:07 bradrn

This should not be a special case. When exporting to latex org converts any unknown block type e.g. #+begin_asdf as \begin{asdf} environment, and when exporting to html it exports as <div class="asdf">.

tgbugs avatar Jul 30 '22 05:07 tgbugs

Yes, so my point was: given this is not a special case, how precisely should it be exported to Markdown?

bradrn avatar Jul 30 '22 07:07 bradrn

I don't think markdown code blocks would work because those need to translate as #+begin_src asdf (iirc). Maybe you could use a fenced code block with a type annotation for this? Embedding html and using the <div class='asdf'> won't roundtrip without significant issues.

tgbugs avatar Jul 30 '22 07:07 tgbugs

Unfiltered thoughts: We generally try to do what's closest to the author's intend; I wouldn't expect many situations in which +BEGIN_abstract was used for something other than an article abstract. However, we've usually opted to match the HTML exporter in the past, so matching the LaTeX exporter just for this case would be slightly unprincipled.

Another thing to consider is that we'd lose information, namely the position in the text, if we were to move the abstract block from the body into the metadata. That behavior might be undesired. Additionally, we'd have to add support for abstracts to the org writer for symmetry.

tarleb avatar Jul 30 '22 17:07 tarleb

I lean towards adding built-in support for abstracts; I don't feel strongly about this though, given that the above filter offers a reasonable alternative solution.

Feel free to indicate your preference by adding a reaction to this comment (:+1: implement -- :-1: don't)

tarleb avatar Jul 30 '22 17:07 tarleb

Cool, thanks a lot.

michelk avatar Aug 30 '22 09:08 michelk

I used to convert org files to latex with versions before 2.19.1 and I could detect the abstract block (#+BEGIN_ABSTRACT) with the lua filter listed above. With later version, it does not work any more, unless I use #+begin_abstract. I am confused because it sound the request was about avoiding case sensitive keyword?

cadamosto avatar Apr 01 '23 12:04 cadamosto

Indeed, that was an oversight. Fixed in ef16a88cdec6e7fb48142ae74ef3811e4fe749a7.

tarleb avatar Apr 01 '23 15:04 tarleb