go-strip-markdown
go-strip-markdown copied to clipboard
Option to exclude hashtags or mentions
Hi!
this library is very helpful in creating short descriptions from posts. However there are few corner-cases, one of them is that we have some special meaning sort of entities such as #hash_tags and @m_e_n_t_i_o_n_s. They are pre-processed into valid MD before rendered as MD. However when we use Strip to remove markdown from raw content, it might become broken, for example:
in:
#one #two #three #four #five_six_seven_eight #nine_ten_eleven #twelve
out:
#one #two #three #four #fivesixseveneight #nineten_eleven #twelve
Would be great to be able to exclude them, for instance to provide a list of regexps that will "mark" some blocks as "excluded".
Thanks for the suggestion! If you want to submit a pull request, I'll be happy to review it.
Some Markdown parsers will pick up and italicize words inside other words, when they're surrounded by underscores. E.g. nine_ten_eleven might look to it like _ten_ should be italicized. I imagine that's what's happening here -- the library is stripping the "markdown" around those inner words.
In fact, underscores does causes problem for parsers and that's why it's recommended to use * over _. There's also recommendation under best practices here https://www.markdownguide.org/basic-syntax/#italic-best-practices .
We can add an option to skipUnderscores while parsing.
@thebaer can you take look at the PR?