org-parser icon indicating copy to clipboard operation
org-parser copied to clipboard

org-parser is a parser for the Org mode markup language for Emacs.

  • General

#+html:

Tests:

#+html: Clojars Project

Community chat: #organice on IRC [[https://libera.chat/][Libera.Chat]], or [[https://matrix.to/#/!DfVpGxoYxpbfAhuimY:matrix.org?via=matrix.org&via=ungleich.ch][#organice:matrix.org]] on Matrix

** What does this project do?

=org-parser= is a parser for the [[https://orgmode.org/][Org mode]] markup language for Emacs.

It can be used from JavaScript, Java, Clojure and ClojureScript!

** Why is this project useful / Rationale

Org mode in Emacs is implemented in [[http://git.savannah.gnu.org/cgit/emacs/org-mode.git/tree/lisp/org-element.el][org-element.el]] ([[https://orgmode.org/worg/dev/org-element-api.html][API documentation]]). The [[https://orgmode.org/worg/dev/org-syntax.html][spec for the Org syntax is written in prose]].

This is already great work, yet it has some drawbacks:

  1. The spec is not machine readable. Hence, there can be drift between documentation and implementation. In fact, during the development of [[https://github.com/200ok-ch/organice/][organice]], our web-based Org implementation with great mobile phone support, and =org-parser= we have encountered drift.
  2. org-element.el is naturally written in Emacs lisp and makes strong use of Emacs as a text-processor. Hence, its code can only be used within Emacs.

While writing the official spec already is an amazing effort in the standardization of the Org format, the power of Org is so enticing that many want to use it outside of Emacs, as well. Since org-element.el only runs in Emacs, this caused a myriad of implementations for other platforms (JavaScript, Rust, Go, Java, etc) to have been created. Most implementations are only partial, and more importantly each of them creates another island. Since they are just as programming language dependent as org-element.el, it is impossible to share logic between them.

=org-parser= aims at alleviating both these issues. It documents the syntax in a standard and machine readable notation (EBNF). And the reference implementation is done in a way that it runs on the established virtual machines of Java and JavaScript. Hence, =org-parser= can be used from all programming languages running on those virtual machines. =org-parser= provides a higher-level data structure that is easy to consume for an application working with Org mode data. Even if your application is not running on the Java or JavaScript virtual machines, you can embed =org-parser= as a command-line application. Lastly, =org-parser= brings a strong test suite to document the reference implementation in yet another unambiguous way.

It is our aim that =org-parser= can be the foundation on which many Org mode applications in many different languages can be built. The applications using =org-parser= can then focus on implementing user facing features and don’t have to worry about the implementation of the Org syntax itself.

** Architecture

The code base of =org-parser= is split into four namespaces:

  • org-parser.core (top level api, i.e. =read-str=, =write-str=)
  • org-parser.parse (aka. deserializer, reader)
  • org-parser.parse.transform (transforms the result of the parser into a more desirable structure)
  • org-parser.render (aka. serializer, writer)

Thus =org-parser= has become a misnomer in the sense, that it now strives to be =clojure/data.org= (after the pattern of existing Clojure libraries like =data.json=, =data.xml=, =data.csv=, etc) providing reader as well as writer capabilities for the serialization format =org=.

  • Project State

This project is work-in-progress. It is not ready for production yet because the structure of the AST (parse tree) can still change.

The biggest milestones are:

  • [X] Finish [[http://xahlee.info/clojure/clojure_instaparse_grammar_syntax.html][EBNF parser]] to support most Org mode syntax - [X] Headlines - [X] Org mode =#+*= stuff - [X] Timestamps - [X] Links - [X] Text links - [X] Footnotes - [X] Styled text - [X] Drawers and =#+BEGIN_xxx= blocks - [ ] Nested markup (see #12)
  • [X] Setup basic transformation from the parse tree to a higher-level structure.
  • [-] Transformations to higher-level structure: catch up with features that are already supported by the EBNF parser.
  • [-] Render parsed org file with =write-str=

It can already be useful for you: E.g. if your script needs to parse parts of Org mode features, our EBNF parser probably already supports that. Do not underestimate e.g. timestamps. Use our well-tested parser to disassemble it in its parts, instead of trying to write a poor and ugly regex that is only capable of a subset of Org mode's timestamps ;)

Don't hesitate to contribute!

  • Development

=org-parser= uses [[https://github.com/Engelberg/instaparse/][instaparse]] which aims to be the simplest way to build parsers in Clojure. Apart from living up to this claim (and beyond the scope of just the one programming language), using instaparse is great for another reason: Instaparse works both on CLJ and CLJS. Therefore =org-parser= can be used from both ecosystems which, of course, include JavaScript and Java. Hence, it is possible to [[#usage][use it]] in various situations.

** Prerequisites

Please install [[https://clojure.org/guides/getting_started][Clojure]] and [[https://leiningen.org/][Leiningen]].

There's no additional installation required. Leiningen will pull dependencies if required.

** Testing

Running the tests:

#+BEGIN_SRC shell

Clojure

lein test

CLJS (starts a watcher)

lein doo node #+END_SRC

If you're not familiar with Lisp or Clojure, here's a short video on how the tooling for Lisp (and hence Clojure) is great and enables fast developer feedback and high quality applications. Initially, the video was created to answer a [[https://github.com/200ok-ch/org-parser/issues/4][specific issue]] on this repository. However, the question is a valid general question that is asked quite often by people who haven't used a Lisp before.

[[https://raw.githubusercontent.com/200ok-ch/org-parser/master/doc/images/quick_introduction_to_lisp_clojure_and_using_the_repl.jpg]]

You can watch it here: https://youtu.be/o2MLHFGUkoQ

  • Release and Dependency Information

Note: The version number should be replaced with the current version of org-parser. See the clojars badge at the [[https://github.com/200ok-ch/org-parser#general][top of this README]].

** [[https://clojure.org/reference/deps_and_cli][CLI/deps.edn]] dependency information:

#+BEGIN_SRC org-parser/org-parser {:mvn/version "0.1.4"} #+END_SRC

** [[https://github.com/technomancy/leiningen][Leiningen]] dependency information:

#+BEGIN_SRC [org-parser "0.1.4"] #+END_SRC

  • Usage :PROPERTIES: :CUSTOM_ID: usage :END:

At the moment, you can run =org-parser= from Clojure, ClojureScript, or Java. Other targets which are hosted on the JVM or on JavaScript are possible.

** Clojure Library

#+BEGIN_SRC clojure :exports both (ns hello-world.core (:require [org-parser.parser :refer [parse]] [org-parser.core :refer [read-str write-str]]))

(prn (parse "* Headline")) (prn (read-str "* Headline")) (println (write-str (read-str "* Headline"))) #+END_SRC

#+RESULTS: | [:S [:headline [:stars ""] [:text [:text-normal "Headline"]]]] | | {:headlines [{:headline {:level 1, :title [[:text-normal "Headline"]], :planning [], :tags []}}]} | | " Headline\n" |

** Clojure

Run =lein run file.org=, for example:

#+begin_src sh :results verbatim :exports both lein run test/org_parser/fixtures/schedule_with_repeater.org #+end_src

#+RESULTS: : {:headlines [{:headline {:level 1, :title [[:text-sty-bold "Header"] [:text-normal " with repeater"]], :planning [[:planning-info [:planning-keyword [:planning-kw-scheduled]] [:timestamp-active [:ts-inner [:ts-inner-wo-time [:ts-date "2019-11-27"] [:ts-day "Wed"]] [:ts-modifiers [:ts-repeater [:ts-repeater-type "+"] [:ts-mod-value "1"] [:ts-mod-unit "d"]]]]]]], :tags []}}]}

** Java

First, compile =org-parser= with:

#+begin_src sh :exports code :results silent :tangle buildjar.sh lein uberjar #+end_src

Then run =java -jar target/uberjar/org-parser-*-SNAPSHOT-standalone.jar file.org=, for example:

#+begin_src sh :results verbatim :exports both :tangle testjar.sh java -jar target/uberjar/org-parser-*-SNAPSHOT-standalone.jar test/org_parser/fixtures/schedule_with_repeater.org #+end_src

#+RESULTS: : {:headlines [{:headline {:level 1, :title [[:text-sty-bold "Header"] [:text-normal " with repeater"]], :planning [[:planning-info [:planning-keyword [:planning-kw-scheduled]] [:timestamp-active [:ts-inner [:ts-inner-wo-time [:ts-date "2019-11-27"] [:ts-day "Wed"]] [:ts-modifiers [:ts-repeater [:ts-repeater-type "+"] [:ts-mod-value "1"] [:ts-mod-unit "d"]]]]]]], :tags []}}]}

Note: The =*= character must be replaced with the current version number of org-parser. See the clojars badge at the [[https://github.com/200ok-ch/org-parser#general][top of this README]].

  • License