uniorg icon indicating copy to clipboard operation
uniorg copied to clipboard

Optionally parse org data-trees as nodes instead of pages

Open ispringle opened this issue 2 years ago • 2 comments

It'd be great if there was a way to parse an org file's data-tree and convert it into nodes of data as opposed to the more markdown-style file parsing where the file is a single entity.

For example given this org file:

* blog
** This is a post
    Here is some post content
** This is another post
    This is some different post content

Currently the above file would get parsed as a single entity and you'd end up with a a <h1>blog</h1> and then the h2 headings under that. If we parsed this in a more org-ish way and treated headings as nodes on a data-tree we'd end up with a data structure such as:

nodes: [
  blog: {
    content: "...",
    nodes: [
      "This is a blog post": {...},
      "This is another post": {...},
    ],
    ...
  },
  ...
]

ispringle avatar Aug 04 '22 19:08 ispringle

Hey.

This structure is less "org-ish" because it doesn't follow org structure. Examples of cases that would be hard to handle:

  • inlinetasks (headings in-between content)
  • headings that don't nest nicely: *** headings under * ones
  • repeated heading titles
  • the order of headings is almost lost
  • can your blog posts have any heading? Should all headings be nodes or should we apply an arbitrary rule? (e.g., to only lift headings with ids as org-roam does)

I'd say that this is a rather specific use case (making all headings into "nodes") and I wouldn't implement it. The good news is that it is easy to do yourself: you could traverse org-data and section nodes and lift their section children as nodes (if they satisfy your lifting condition).

(Lifting all headings with IDs as nodes is more common (org-roam) and I would love to see that as a library.)

rasendubi avatar Aug 06 '22 12:08 rasendubi

Yes, there would need to be some property value that signals the heading is now a leaf and not another node in the tree. ox-hugo does this by saying that any heading with a property drawer that contains a :EXPORT_FILE_NAME: is a leaf and all the content in it will be treated as content to be transformed into html. All my blog posts already contain an ID so perhaps that would be a good avenue to pursue.

ispringle avatar Aug 23 '22 23:08 ispringle