whatsapp-chat-parser
whatsapp-chat-parser copied to clipboard
Styling in Whatsapp messages
Whatsapp messages have styling within text, such as bold, underline, strikethrough. The given format only allows for plain text messages.
I'm new to open source, but would be happy to help build this feature, but can't think of a way to get this working without breaking the API contract.
EDIT: They added more styles since I wrote this comment
Hi @4dwaith,
Interesting idea, let's start to see what styles that are supported by whatsapp:
_italic_
*bold*
~strikethrough~
```monospace```
In html that would be:
<i>italic</i>
<b>bold<b>
<s>strikethrough</s>
<pre>monospace</pre>
<!-- or -->
<code>monospace</code>
We would need to either create some new regex patterns to detect the special characters or use a lightweight library to do it for us.
Several tests would be needed to catch the edge cases, for example if you have:
```var my_nice_variable = 'my string';```
It should not format the _nice_
as italic because it's already inside a code block.
Or an url with underscores may get formatted and not work anymore.
There are many things that can go wrong.
With this in mind I think that honestly this could overcomplicate things a bit too much for my liking, I'd like to keep this library dependency-free and as simple as possible.
can't think of a way to get this working without breaking the API contract.
That would not a problem as long as the feature is implemented behind an optional configuration. Something like this:
whatsapp.parseString(text, { parseRichText: true });
@Pustur These sequences look a lot like markdown. Maybe you can use an existing markdown formatter library (or perhaps the consuming code should use a markdown rendering library so you don't have to do anything at all.)
@speshak I'm more leaning towards the second option, this should be done externally to the library.
Also while the format looks like markdown, it's not exactly a common flavour of it as far as I can tell, in the following example, both the italic and bold are rendered as italic by default:
It seems possible to customize how that library works but I'm not currently interested in doing so.
@Pustur Apologies, I have no idea why I didn't notice your first response. I should've responded months ago.
Not sure about the regex pattern. As specified in your next example, whether or not to parse the italics depends on whether we have previously encountered a code marker. It won't be a context-free state machine, so I don't think we can use regular languages.
That said, I don't think your two examples would have an issue - underscores only indicate italics if there are spaces before the start mark and after the end mark, and no spaces after the start mark and before the end mark. URLs for sure wouldn't follow that rule, though code might.
I've played around a bit, and the rules actually seem straightforward and intuitive. Here are my conclusions
-
Code markers interrupt and unstyle everything else.
``` these *are* _just_ ~five~ words ``` becomes.
these *are* _just_ ~five~ words
*these ```are just five``` words* becomes *these
are just five
words * -
Code markers are also the only styles that work across multiple lines
-
The other three styles are compatible within each other.
*these _are ~just~ five_ words* becomes these are ~just~ five words
-
When two styles conflict, the one that appeared first wins
*these _are just * five words_ becomes these _are just five words_
Thank you very much for that bit about strikethrough! All this time I thought we would be forced to use CSS attributes and span tags. Can't believe I hadn't heard of that tag, this looks much more doable now.
The marked library seems relatively easy to extend, I got bold to work, but the new problem is that newlines are not normally respected since markdown needs 2 spaces at the end to insert a <br>
See the Codesandbox demo
Maybe you can make it work properly in the context of Whatsapp messages