Implement Neorg converter
There is a cool new project called Neorg that is seeing a lot of work very quickly. You can check out the README, but in a nutshell it's a new take on the org-mode style file format, this time for Neovim and written in Lua. I like it a lot so far and I think it would be very useful to have a Pandoc converter for it. I am interested in Haskell and would be happy to write it on my own, but I don't know a single thing about it and can't find anything in the docs about making your own reader/writer.
Do you guys think this is a good idea? What do I need to know in Haskell before attempting to do it for myself?
Interesting! I wrote the org-mode reader, it was my first "real world" Haskell experience. So I think it's definitely possible.
The neorg format seems somewhat similar to org, so a good start might be
to add neorg as an org mode "extension" so one could call pandoc with
pandoc --from=org+neorg. The extension would have to enable neorg
specific syntax, and disable unsupported org mode features.
The other option would be to implement a converter in Lua: one could convert from the neorg AST to the JSON representation of pandoc's AST. This would all happen in neovim, and the result would be fed into pandoc. The conversion results would likely be even better than with a separate Haskell reader.
I started to write an intro to pandoc's code and architecture, it might be a good starting point regardless of which option you'd like to go for: #7316
Note also that we'd want to wait til neorg gets more traction (or doesn't) before deciding to include a dedicated reader/writer. For now, the solution @tarleb mentions of implementing a converter in lua sounds very promising. Especially since neorg is already implemented in lua; you could just hook in to its own parsing.
This is awesome, thank you so much for responding. And yes I think it would be best to start off with Lua. There is already a Treesitter parser for the norg file format, is there any way to leverage that to make the Pandoc conversions easier?
Edit: I also got started tweaking the Org parser and got some of it working... I like the idea of adding the Norg parser on top of it since the syntax is already very similar
There is already a Treesitter parser for the norg file format, is there any way to leverage that to make the Pandoc conversions easier?
There should be. If the parser produces some kind of AST, then you just need to convert that AST to a JSON representation of the pandoc AST. (See pandoc -t json output to see what kind of thing you'll want to generate.) This ought to be far easier than writing a parser from scratch.
So Treesitter's AST is pretty good, I'm just not sure how I could convert it to Pandoc's. Pandoc's JSON representation of the AST is a lot less dense than tree-sitter's, and I don't think it would be possible to just convert one to another. Is it viable to just convert the tree-sitter AST to Pandoc's? I'm not sure how the Pandoc AST actually looks in a plaintext format, is there a way to print it, or any examples of it online?
Yes, you can render the pandoc AST from any document by using -t native (this gives you the native Haskell version) or -t json (for the JSON marshalling).
Awesome, thank you. I will get to work on it :)
Hello, I would like to continue on implementing this Neorg converter, as I have some insights about lua and Tree-Sitter. My objective is to generate a json for pandoc.
See pandoc -t json output to see what kind of thing you'll want to generate.
What kind of json should I generate ? Do I have some syntax of what json keys are required by pandoc ? Edit: is this what I'm searching ?: https://hackage.haskell.org/package/pandoc-types-1.22/docs/Text-Pandoc-Definition.html
That is the definition (in Haskell) of the pandoc AST. The JSON serialization of this is not documented, but predictable from some samples. You can use pandoc itself to see how org is converted to AST elements, and how these are turned into JSON, e.g.
% pandoc -f org -t native
* Hello
1. one
2. two
^D
[ Header 1 ( "hello" , [] , [] ) [ Str "Hello" ]
, OrderedList
( 1 , DefaultStyle , DefaultDelim )
[ [ Plain [ Str "one" ] ] , [ Plain [ Str "two" ] ] ]
]
And then
% pandoc -f native -t json
[ Header 1 ( "hello" , [] , [] ) [ Str "Hello" ]
, OrderedList
( 1 , DefaultStyle , DefaultDelim )
[ [ Plain [ Str "one" ] ] , [ Plain [ Str "two" ] ] ]
]
^D
{"pandoc-api-version":[1,23],"meta":{},"blocks":[{"t":"Header","c":[1,["hello",[],[]],[{"t":"Str","c":"Hello"}]]},{"t":"OrderedList","c":[[1,{"t":"DefaultStyle"},{"t":"DefaultDelim"}],[[{"t":"Plain","c":[{"t":"Str","c":"one"}]}],[{"t":"Plain","c":[{"t":"Str","c":"two"}]}]]]}]}
@jgm Thanks a lot for your input, what about testing my json and trying to generate an output in a different format ? Do you provide ways to do it ?
Sure, you can use pandoc again:
pandoc -f json -t native
or
pandoc -f json -t org
You'll get an error if the json isn't valid.
Edit: is this what I'm searching ?: https://hackage.haskell.org/package/pandoc-types-1.22/docs/Text-Pandoc-Definition.html
I'm seeing that some of the types from the json representation come from here, do you think it is the best way for me to add support to a new language with json conversion, or should I go anywhere else ?
Yes, the json is just a serialization of the types defined in that module.
I'm working on lua based Pandoc reader. Can this lua Reader be integrated inside Pandoc?
No, a Lua reader can't be integrated as a regular pandoc reader. It might provide guidance in writing a Haskell reader.
It might provide guidance in writing a Haskell reader.
Can Haskell reader run lua reader inside? I'm curious if I can use existing codebase using pandoc lua module inside or should I rewrite the whole parser in Haskell.
Can Haskell reader run lua reader inside?
In principle this is possible, sure, but I don't want to do this. I'd like the official readers and writers to be in Haskell.
Note also that we'd want to wait til neorg gets more traction (or doesn't) before deciding to include a dedicated reader/writer.
Three years later, what's your opinion?
I don't know enough about neorg to say. Maybe you can help. How widely used is it, how stable is it, how much does it diverge from Emacs org-mode syntax? Is it changing so rapidly as to be a moving target?
I've been trying Neorg for a few days only, and in general I think I like it. But to switch to it completely implies converting hundreds of markdown files I have, and this is exactly the reason I asked the question. They seem to have the functionality to export norg to markdown, but none to import from markdown. My guess is that they would get more traction if it were not for this lack.
As to your questions, I can only share my impression after few days of acquaintance with the app and the specs:
- In the neovim community, it seems to be if not very popular, but well-known at least (6k+ github stars, 3rd party youtube tutorials, etc.). I don't think there are specialized tools outside neovim to work with norg-format files at this time, though (they have a mobile app in their roadmap).
- Breaking changes still happen. But, AFAIK, it concerns only the functionality/commands of Neorg, not the norg-format syntax.
- I don't think it is relevant to say about diverging from org-mode, since norg is not its fork. It probably borrowed certain ideas and even syntax elements from org-mode, but being similar to it is not norg's intention (I personally find norg syntax cleaner).
I don’t think Norg is ready to be implemented as pandoc target. Norg format itself still have some stuff that should be fixed and it may have breaking changes soon. For example, we can’t convert this markdown syntax to norg yet:
- list item
- sublist item
- sublist item
first list item continues
- another list item
I am not an expert in parsers and any markup syntax, but I can see that treesitter markdown parser identifies the line first list item continues as a subnode of the second sublist item. Is that what one would expect? Also, out of interest, how would you translate your example to orgmode format?
Well, I was wrong for that example, sorry for that. I mean this:
- list item
- sublist item
- sublist item
first list item continues
- another list item
I don’t know about orgmode so I can’t translate this to it.
There are still a lot of conversations going on for Norg spec (e.g. slides, free-form non-verbatim attached modifiers are not even documented well in v1 spec there are tones of edge-cases not covered), this is clearly not stable, at least for now. Also, there isn’t any working parser for Norg right now. v1 tree-sitter parser still having hard time parsing standard ranged tags and v3 tree-sitter parser is still on development (due to various syntax design issues and that’s why I’m saying it will have breaking changes soon).