neorg icon indicating copy to clipboard operation
neorg copied to clipboard

Add the ability to export norg files to HTML

Open adc613 opened this issue 8 months ago • 5 comments

Mirrors the methods used by markdown exporter to export files to HTML

Provides an option to apply a custom handler for norg range tags

adc613 avatar Apr 29 '25 16:04 adc613

First off, wow thank you for going through the effort to do this. This is massive.

Second off, I'm so sorry for the amount of bugs that I'm about to point out. For all I know this all exists for the markdown exporter too, I don't use it myself.

I'll order them roughly by importance I guess:

  • illegal chars are not escaped, this includes <, > others. This matters a lot b/c it makes it possible to generate illegal/invalid HTML
  • List items with text that spans many lines become multiple list items image
  • generic links need to be resolved so they can correctly link to headings. Currently you get a link with #generic-headingname but the heading id is #heading1-headingname, so the link will never work.
  • Code block indentation is off. This is a common mistake, the text in a codeblock all shares leading indent, but the TS node for the content doesn't include the first leading indent, so you have to get it from the buffer to remove it, otherwise you end up with this image
  • inline verbatim and math are represented with pre tags which are block by default, and so text ends up getting split up when it shouldn't
  • inline link targets aren't given ids to link to, and they create new paragraphs which places text on different lines when it shouldn't be. example: image
  • Definition and footnote links don't work
  • code blocks that have unrecognized languages still get class="python" for some reason, they should just not have a language class
  • no class for spoilers, just a span (doesn't really matter honestly)
  • duplicate headings aren't handled so you can end up with a heading with the same ID twice (I think this is safe to ignore for now)

let me know if you want help on any of this, I'm very interested in this feature myself

benlubas avatar May 04 '25 19:05 benlubas

I also think that it would be nice to have an option to strip external file links out of the HTML (they break anyway). My use case is for emails; ideally I'd be able to copy something from my notes that may link elsewhere in my notes without having to manually remove the link (b/c the link is useless to the email recipients).

But this is out of scope of the PR, I'll add this feature myself at a later date

benlubas avatar May 04 '25 19:05 benlubas

Thanks for the detailed feedback and thorough testing. I tried to fix most of the issues. The two big things that I didn't address were math statements and footnotes.

For Math statements, I think the ideal solution would be to use the math tag. But I didn't see way to do that without parsing the inner content. If you had any thoughts here let me know. It's structured so that changing the tag is a single line of code so it'd be relatively simple to use a tag other <pre> if you think there's a more appropriate tag.

I also didn't address footnotes. If you have any opinions how to best handle them, let me know. I wasn't sure what the best strategy would be. For simplicity's sake, I kept the footnotes in a structure that mirrors the treesitter output, as opposed to doing something more opinionated (i.e. I moving footnotes to the bottom of the output). Also, if someone required footnotes, my hope was that the current structure would make it simple enough to show/hide footnotes on hover events in CSS.

adc613 avatar May 05 '25 17:05 adc613

The math in norg is normally latex, but some people use typst. I think just leaving the content alone is fine (certainly for the initial PR), as long as it's displayed inline. Here's an example of what I'm talking about:

this: Inline `verbatim`, and math: $|\frac{1}{4}|$. generates this:

<p>
  Inline
<pre>verbatim</pre>, and math: <pre class="inline-math">\frac{1}{4}</pre>.
</p>

which looks like this: image when it should look like this image


re: footnotes, I think no need to do anything fancy, just put them all at the bottom of the document after an <hr/> tag. They don't even need to be styled. And then give them ids and link to them.


Other stuff:

  • I've noticed that the html escape doesn't effect text inside of math tags, ranged verbatim (code blocks), inline verbatim, or inside of bold/italic/etc.
  • The way you handled generic links was surprising, I expected you'd resolve the links (using the hop module) to get the correct type and then use that. So instead of #generic-whatever you would resolve #whatever and figure out "oh it's a heading two" and use #heading2-whatever for the link. I think this method is much cleaner, and less error prone in the long run (you just implement the resolution code once vs adding a generic span to every single link target), but this works for now
  • Every code block has a class of "python" even when it's not python code

benlubas avatar May 09 '25 01:05 benlubas

I think I fixed most the issues you mentioned, but let me know if you find anything. It's difficult to test exhaustively test the exporter, I suspect there are more edge cases that I missed.

I agree using the hop method would be less error prone. If I have time later, I may look into it.

adc613 avatar May 10 '25 02:05 adc613

Hi, I just thought I'd follow up and ask is this pull request worth pursuing? I'm happy to address comments if its a worthwhile change, but I didn't want too much time on this if its never going to be used.

I also ran into an issue with the hop module (link) when I tried to use hop for link generation in this branch.

adc613 avatar Jun 30 '25 19:06 adc613

Hi, thanks for the ping. I took a look a while ago while I was on vacation and then totally forgot about it.

Indeed it looks like you've fixed the things I mentioned.

The next thing that stands out is just footnote formatting, it's sometimes a little unclear what's what in the footnotes section. I don't think that it should prevent this from being merged though.

The issue you encountered has happened in a bunch of places since 0.11, we can definitely fix that

benlubas avatar Jul 04 '25 01:07 benlubas

Yeah, I agree, I'm not sure what the best way to render footnotes is. I could pull the logic into the config so that people can configure it. Would that be preferable? (example)

adc613 avatar Jul 06 '25 14:07 adc613

Does this PR still have a chance of being worked on? This feature would be incredibly useful to have.

rfmineguy avatar Aug 21 '25 19:08 rfmineguy

@rfmineguy you're welcome to use the branch for now. Is there something still missing from it?

I'm thinking I'll just merge it as is, and it can be improved over time

benlubas avatar Aug 22 '25 00:08 benlubas

Merging this for now. If anyone wants to submit improvements they're welcome to do so!

Thank you @adc613 for the contribution!

benlubas avatar Sep 20 '25 21:09 benlubas