go-strip-markdown icon indicating copy to clipboard operation
go-strip-markdown copied to clipboard

Option to exclude hashtags or mentions

Open inliquid opened this issue 3 years ago • 3 comments
trafficstars

Hi!

this library is very helpful in creating short descriptions from posts. However there are few corner-cases, one of them is that we have some special meaning sort of entities such as #hash_tags and @m_e_n_t_i_o_n_s. They are pre-processed into valid MD before rendered as MD. However when we use Strip to remove markdown from raw content, it might become broken, for example:

in:

#one #two #three #four #five_six_seven_eight #nine_ten_eleven #twelve

out:

#one #two #three #four #fivesixseveneight #nineten_eleven #twelve

Would be great to be able to exclude them, for instance to provide a list of regexps that will "mark" some blocks as "excluded".

inliquid avatar Jul 27 '22 17:07 inliquid

Thanks for the suggestion! If you want to submit a pull request, I'll be happy to review it.

Some Markdown parsers will pick up and italicize words inside other words, when they're surrounded by underscores. E.g. nine_ten_eleven might look to it like _ten_ should be italicized. I imagine that's what's happening here -- the library is stripping the "markdown" around those inner words.

thebaer avatar Jul 30 '22 15:07 thebaer

In fact, underscores does causes problem for parsers and that's why it's recommended to use * over _. There's also recommendation under best practices here https://www.markdownguide.org/basic-syntax/#italic-best-practices .

We can add an option to skipUnderscores while parsing.

daveteu avatar Sep 14 '22 06:09 daveteu

@thebaer can you take look at the PR?

daveteu avatar Sep 29 '22 03:09 daveteu