orgparse Scope, aim and future of orgparse

Scope, aim and future of orgparse

Open novoid opened this issue 4 years ago • 7 comments

Hi!

Author of this primitive Org mode to Python3 parser speaking. I'd love to replace my stupid parser with a decent one in the future - if possible.

So: what is the goal of orgparse? What is the scope? What is the non-scope? What is the vision? I'd love to read such a small section in your readme file.

For example: will orgparse be able to parse all important Org mode syntax elements such as lists, tables, internal and external links, footnotes, text formatting (italic, underline, bold, ...), and so forth?

Currently, orgparse does seem to store the content of a heading without further analyzing it except various time- and date-stamps.

Oct 01 '19 22:10 novoid

Hi!

Thanks for good questions! I guess ultimate goal is to have a fast, well tested and stable Python org-mode parser.

Another thing I'd like it to keep is BSD license the original author used. That means only using documentation and reverse engineering parsing/regexes etc, but I think ultimately it's better for the format and Org ecosystem not to be restricted by copyleft license.

I'm using Org-mode as a primary means of organising information nad logging; and I don't see a good alternative to the format around there so I'm planning to support the parser in the foreseeable future.

That said, for my purposes the parser more or less satisfies me now since I mainly use outlines, timestamps and tags. However there are couple of projects where I'm extracting links and tables as well (in somewhat ad hoc way), and I'd like to integrate it too (https://github.com/karlicoss/orgparse/issues/8).

My personal blogging setup is a bit different: I use Emacs in batch mode to render my org-mode files and then do a bit of post-processing on top of it (mainly to fix things that org-export does I find stupid). However I also want to use some pre-processing (e.g. to support some dynamic things/content filtering) and for that I'm definitely not going to use Elisp :) So I'm motivated to support proper structure parsing too.

I could spare some time to implement other syntax which I'm not using as well. I guess would be interesting to hear what's the highest priority for people. If you could give a sample org mode file with bits of syntax that matter most for you (and perhaps Python interface that you'd expect for them) that'd help me to prioritize :)

In terms of implementing it, I'd like to keep the interface returning strings wherever possible (e.g. .heading/.body) for the sake of simplicity, so perhaps separate method returning 'rich' content would serve people's needs for more structure well. I'll give a think/experiments on how to implement it and share. If you have any other libraries in mind for interface inspiration, please let me know!

P.S. Good point about readme section; I'll add it when I shape my thoughts a bit more clearly.

Oct 04 '19 09:10 karlicoss

Hi, I don't say that my decisions are perfect. However, https://github.com/novoid/lazyblorg/wiki/Orgmode-Elements lists the Org mode elements my naïve parser supports (and I am using for my own blog http://karl-voit.at/ ) and https://github.com/novoid/lazyblorg/wiki/Data-Structures gives a short intro into some data structures.

My choice so far was to retrieve a list of Org mode elements that consists of a list themselves. You see examples on https://github.com/novoid/lazyblorg/wiki/Data-Structures#representation-of-blog-data within the key variable "content".

Oct 04 '19 19:10 novoid

https://github.com/novoid/Memacs/blob/master/memacs/lib/orgformat.py could be a promising connection point between our projects. Maybe we are able to develop a merged version from both projects as a standard library for formatting strings into Org mode elements.

My code may be not 100% clean since it was developed "on demand" when something was needed. I once added some unit tests which are not providing full coverage. However, the tests show some basic examples: https://github.com/novoid/Memacs/blob/master/memacs/lib/tests/orgformat_test.py

Oct 05 '19 23:10 novoid

This Issue is quit old. But I just want to put my thoughts into it, too.

I am using org-roam-v2 which use ID`s to link notes together. There are some "solutions" arround to create HTML content out of it but none of them work well for me. I do not need a Blog/Website but just HTML files for offline use.

Also org-mode itself has problems exporting content and taking the ID-links into account. Side problem is that this is not reproducible but occur often. And I not the only person reporting such problems. I invested to much time in finding a solution, understanding bugs etc.

Now I decided to write my own "org-to-html-thing". I do not want to learn Lisp. I am from Python so this is my choice. I was really glad to see that something like orgparse exists. This save half of the work for me.

Since yesterday I have a working prototype. I will polish it up a bit and will publish it on my main account on Codeberg in the next days.

Btw: I am free for naming suggestions. I am not good with things like that. :D

Mar 15 '22 11:03 buhtz

Now I decided to write my own "org-to-html-thing". I do not want to learn Lisp. I am from Python so this is my choice.

You are aware of https://pypi.org/project/pypandoc/ ?

I use it as a fallback for single Orgdown elements in lazyblorg and it's doing great so far.

Mar 16 '22 16:03 novoid

I could spare some time to implement other syntax which I'm not using as well. I guess would be interesting to hear what's the highest priority for people. If you could give a sample org mode file with bits of syntax that matter most for you (and perhaps Python interface that you'd expect for them) that'd help me to prioritize :)

Meanwhile, there is https://gitlab.com/publicvoit/orgdown which defines an initial level of syntax elements I'd expect to be included in the supported elements of any Org-mode-syntax parser. Maybe this is a selection of elements that makes sense to more people.

Mar 16 '22 16:03 novoid

You are aware of https://pypi.org/project/pypandoc/ ?

Not until yet. Seems like it can handle some basic org constructs and convert them to html. But I will need a pandoc binary in the background.

I will keep this in mind when I come to some more complex org construct. But currently I only of text, links, headings, lists and sources blocks.

Mar 16 '22 20:03 buhtz

orgparse orgparse copied to clipboard

Scope, aim and future of orgparse

orgparse
orgparse copied to clipboard