Text formatting can be quite different from HTML to DraftJS
I have made an email signature editor in DraftJS. Users can pull in their current signature from Gmail, which comes in HTML. When they open the editor we convert the HTML to DraftJS state using draft-convert but the formatting can often be quite off. Sometimes it will have new lines where none were before or text will all be on the same line.
Here is a simplistic example where we have how text looks in HTML and when converted to DraftJS. https://codepen.io/JohnMaguir/pen/OQbgLZ
An idea might be to try and manipulate the HTML before it goes to draft-convert but it would be hard to manipulate it and catch all scenarios without losing any of the users formatting.
I've also tried using custom htmlToBlock method, where I return false to ignore any node that is just an empty div or a div that only includes a br tag, based on this - https://github.com/HubSpot/draft-convert/pull/62
This will stop the insertion of some blank lines but doesn't fix the issue of the first two lines being pushed together and I'm not sure if it will work for more complex HTML.
If this is a bug that needs to be fixed I can propose a pull request, just point me in the right direction, otherwise any and all ideas would be much appreciated. If you need any more info please let me know
It seems like htmlToBlock doesn't parse br tags so that we can replace them with a custom block. It also seems that first line and second line should be in separate blocks.
I have a little modify @JohnMaguir example to understand what happens there with some additional details https://codepen.io/anon/pen/gvmZQy . Check it with console logs on and it gives this output:
0 "body" "<body><div>first line<div>second line<div>deep line</div></div></div><div>thrid line<div>deep line</div><div>2 deep line</div><br></div><div>fourth line</div></body>" "ul" null
1 "div" "<div>first line<div>second line<div>deep line</div></div></div>" "ul" null
2 "div" "<div>second line<div>deep line</div></div>" "ul" "unstyled"
3 "div" "<div>deep line</div>" "ul" "unstyled"
4 "div" "<div>thrid line<div>deep line</div><div>2 deep line</div><br></div>" "ul" null
5 "div" "<div>deep line</div>" "ul" "unstyled"
6 "div" "<div>deep line</div>" "ul" "unstyled"
7 "div" "<div>2 deep line</div>" "ul" "unstyled"
8 "div" "<div>2 deep line</div>" "ul" "unstyled"
9 "div" "<div>fourth line</div>" "ul" null
And draft content:
first linesecond linedeep line
thrid linedeep line
2 deep line
fourth line
First thing catches the eye - nested div blocks sometimes are called twice in htmlToBlock function. But it has no any visible/breakable influence on result. See lines after thrid line
Second thing to mention - first child block always goes in one line with parent in result draft content. 2, 3 and more children goes as separate lines in result content and it is not dependent how it is deeply nested. It is looks like a bug in convertFromHTML.js, which put first children block in one line with parent block.
I think #121 should fix the core problems you guys are experiencing - When playing around with @dem's codepen I was able to get the two fields to be an exact visual match. For @JohnMaguir's I still had extra lines due to the <br>s at the end of <div>s. I'm going to do more investigation here to see if this can be resolved as well.