nimforum Formatting of search results is poor

Surely this is not a reasonable result for static statement:

Screen Shot 2022-09-28 at 21 12 47

The code block is broken, and a random sentence is printed in a huge font while the title's weight is weaker than the author's.

Sep 29 '22 04:09 esafak

I just tried to take a look at this out of curiosity and received this output in the forum:

That seems reasonable enough to me. I'm not 100%, but was the issue maybe fixed in the meantime and can thus be closed? Or am I overlooking anything?

Jan 24 '23 13:01 PhilippMDoerner

it does indeed seems it is fixed but looking at past commits it is unclear what fixed it (I do remember the issue in the past, do not remember recent occurences). and unclear whether or not the forum has been recently deployed. https://github.com/nim-lang/nimforum/issues/317 would be useful :)

Jan 24 '23 14:01 pietroppeter

https://forum.nim-lang.org/t/7969#50770 seems to be broken: "Couldn't render historic post in #50770"

And it cannot be rendered properly in the search results.

Jan 24 '23 15:01 ringabout

mmh, if this screenshot is to be trusted we do have a last deploy last october:

Jan 24 '23 15:01 pietroppeter

https://forum.nim-lang.org/t/7969#50770 seems to be broken: "Couldn't render historic post in #50770"

this is related to this (still open) issue: https://github.com/nim-lang/nimforum/issues/330

I think there are some cases still not working

Jan 24 '23 15:01 pietroppeter

@pietroppeter and @PhilippMDoerner Search the forum for option.sepBy. You should get this:

The issue is only present when the code block gets cut off and doesn't have an ending and/or beginning ```

Jan 24 '23 15:01 jyapayne

I can see the issue in the forum, however, I can't seem to replicate it when I run this locally.

I pretty much just copy pasted jack's post above to make a thread out of it, but no matter what I search, this is the kind of stuff I get:

This is with the forum compiled under 1.9.1, which shouldn't make a difference here.

Jan 24 '23 16:01 PhilippMDoerner

Ohhhhh it's explicitly the comment ## match --<name>:<option> that is making an issue here. The combination of the comment being ## and no indication that this should be a codeblock being present. So basically it just sees the "##" in normal text and assumes "That's normal markdown, so that's a headline" and then we have our problem.

Now the next question that follows this is: If you have a code-comment in a search-result code-block and the indicators for the code-block get cut off... how do you treat this then? That seems like a pretty ugly issue to fix. To prevent this kind of thing you'd kind of have to parse the text to ensure if a code-block occurs that it is always closed at the very least at the end of the comment before ellipsing out.

And that is only if the snippet you want to show is near the start or the end of a codeblock. If you're inside of a code-block you also need to scan this and put the entire snippet in between codeblock markings. I'll need to investigate whether that is even possible since I know this makes use of sqlite FTS, though I'm more familiar with FTS5 than FTS4

Jan 24 '23 16:01 PhilippMDoerner

@PhilippMDoerner You are correct. This issue would likely be fairly difficult to solve based on the reasons you mentioned. With the current structure of the code, I'm also not sure how to approach it. FTS with the snippet extension only returns the partial text from each match, so you'd probably need to get the full text from sqlite to compare so you'd be able to see where the start/end code block markers are. An ugly problem indeed.

I was curious how Discourse solves this, and it looks like they just strip all formatting and render the plain text: https://discourse.codecombat.com/search?q=penPositions%20%3D. Maybe we could do that as well.

Jan 24 '23 21:01 jyapayne

they just strip all formatting and render the plain text

It seems a very reasonable thing to do indeed. Maybe highlighting (bold?) the search term as they do

Jan 24 '23 22:01 pietroppeter

they just strip all formatting and render the plain text

It seems a very reasonable thing to do indeed. Maybe highlighting (bold?) the search term as they do

That actually runs into problems for exactly the example we were looking at. Say the comment looks like this:

## [<p1> (<p2> <p1>)*]

And you have a searchquery of <p1>. Now naturally you can just go and do a "replace" on the HTML string to insert  <p1>  for any hit to<p1> in the string. The issue comes in rendering. How do you render this int he frontend correctly now, because you want the  tags to render as HTML, while you want <p1> to render as text? If you try to render this as HTML, you'll get the <p1> bold but you also get interpreted to be rendered as HTML and it starts disappearing.

Demonstration:

As you can see, there is no to be found anywhere.

Now we can of course circumvent that, we can say to just render the text as is. But that means it'll also render the  tags for text instead of as HTML.

Demonstration:

So either we just render plaintext, or we accept that any HTML-strings in the content itself will not be rendered as text.

As an additional issue though, I haven't found an easy way in karax so far to retain the whitespacing, so it all looks like the unreadable messy textblobs you see above.

Jan 25 '23 14:01 PhilippMDoerner

@PhilippMDoerner If you want to render a search of <p1>, you will have to escape the characters that HTML considers special characters. See here for a complete table of conversions. I believe you should only have to escape &, <, >, ", and '. See the lodash escape table.

Jan 25 '23 21:01 jyapayne

So in your case, you would replace <p1> by  <p1> 

Jan 25 '23 21:01 jyapayne

It does turn out we're overcomplicating things. I just throw together a bit of a solution for multiple issues:

The snippets that get rendered (using SQLITE FTS4-SQL feature called SNIPPET) sometimes fail to render properly. You instead get to actually see the ** that should get rendered into strong tags, because there's no whitespace after the match. I added those, that get rid of an entire class of rendering issues
To prevent ## and # (the leading whitespace is important) to be rendered into a headline, we might as well do a simple replace of # in general to \\# . Most of the time that fixes the issue, sometimes that leads to \# being rendered instead of #. Which I can live with.

Mysteriously that even takes care of the HTML issue

Jan 26 '23 14:01 PhilippMDoerner

That fixes the bolded text issue, but now you have more issues because the markdown is parsing proc sepBy[T, T2](p1: Parser[T], p2: Parser[P2]): Parser[seq[T]] = as proc sepBy T, T2: Parser[seq[T]]. You're losing information in the text because markdown sees [X](Y) and parses it as a link Y with text X. I still think you're better off using the plaintext and escaping the HTML special characters as I mentioned above. Otherwise you're going to have to keep finding hacks to workaround markdown parsing issues.

Jan 26 '23 15:01 jyapayne

The alternative is plain-text and hacks that work around making things sometimes bold and sometimes not, we're running into hacks either way. Because you'll have to escape HTML-characters first and then replace whatever placeholder you inserted around search-matches with  and  tags, Neither feels particularly clean. The benefit this one provides is that at the very least it provides some more readable formatting.

Jan 26 '23 15:01 PhilippMDoerner

But they aren't hacks. HTML has special characters that need to be escaped if you want to display them normally. Replacing &, <, >, ", and ' with their respective escape codes is how you insert text into an HTML element in JS.

I get the formatting will be cleaner in the code samples, but that isn't really an issue when you can just click into the post and read the full code. There is a good reason that Discourse doesn't render text with formatting. It's tricky business and broken text gets rendered like it currently does in the Nim forum.

Also, if you start escaping Markdown characters, the formatting in the regular, non-code text will be broken and won't display headers and links correctly.

Jan 26 '23 15:01 jyapayne

nimforum nimforum copied to clipboard

Formatting of search results is poor

nimforum
nimforum copied to clipboard