draft-js-utils icon indicating copy to clipboard operation
draft-js-utils copied to clipboard

Performance issue when converting HTML with many trailing space to editorState

Open sstur opened this issue 8 years ago • 4 comments

Originally posted here.

The original text follows.


Performance issue when converting HTML to editorState (with stateFromHTML from draft-js-import-html)

Below is an example of some HTML that contains many trailing spaces which causes the parsing process to slow down. Time taken depends on how large the HTML is, it can take more than 1 minute for a fairly large block of HTML.

Also, I used performance.now() to measure the time taken when using stateFromHTML.

"<em> Nemo develops a smaller right fin <\/em> as a result of damage to his egg during the attack, which limits his <em> swimming ability <\/em> . <br><br> Worried about Nemo's safety, Marlin embarrasses Nemo during a school field trip. <br><br> Nemo sneaks away from the reef and is <em> captured by scuba divers <\/em> . <br><br> As the boat departs, a diver accidentally knocks his <a href=\"https:\/\/en.wikipedia.org\/wiki\/Diving_mask\"> <em> diving mask <\/em> <\/a> overboard. <br><br> While attempting to save Nemo, Marlin meets Dory, a good-hearted and optimistic <a href=\"https:\/\/en.wikipedia.org\/wiki\/Paracanthurus\"> <em> regal blue tang with <\/em> <\/a> <em> short-term memory loss <\/em> . <br><br> Marlin and Dory meet three <a href=\"https:\/\/en.wikipedia.org\/wiki\/Shark\"> <em> sharks <\/em> <\/a> – <em> Bruce, Anchor and Chum <\/em> – who claim to be <em> vegetarians <\/em> ."

Time taken: 1st round - 443.50ms 2nd round - 481.62ms 3rd round - 442.65ms

This HTML has the trailing spaces removed and it is working perfectly fine as you can see the result of the time taken to convert below.

"<em>Nemo develops a smaller right fin <\/em>as a result of damage to his egg during the attack, which limits his <em>swimming ability <\/em>.<br>\n<br>\nWorried about Nemo's safety, Marlin embarrasses Nemo during a school field trip.<br>\n<br>\nNemo sneaks away from the reef and is <em>captured by scuba divers <\/em>.<br>\n<br>\nAs the boat departs, a diver accidentally knocks his <a href=\"https:\/\/en.wikipedia.org\/wiki\/Diving_mask\"><em>diving mask <\/em><\/a>overboard.<br>\n<br>\nWhile attempting to save Nemo, Marlin meets Dory, a good-hearted and optimistic <a href=\"https:\/\/en.wikipedia.org\/wiki\/Paracanthurus\"><em>regal blue tang with <\/em><\/a><em>short-term memory loss <\/em>.<br>\n<br>\nMarlin and Dory meet three <a href=\"https:\/\/en.wikipedia.org\/wiki\/Shark\"><em>sharks <\/em><\/a>\u2013 <em>Bruce, Anchor and Chum <\/em>\u2013 who claim to be <em>vegetarians <\/em>."

Time taken: 1st round - 63.91ms 2nd round - 71.46ms 3rd round - 65.11ms

sstur avatar Sep 03 '17 16:09 sstur

I totally know why this is happening. It's related to trimTrailingSpace (and the similar functions trimLeadingSpace and collapseWhiteSpace are just as bad). There's performance issues with manipulating the characterMetaData associated with the text.

I think this needs to be refactored so we trim all the text and then manipulate the meta data only once.

sstur avatar Sep 03 '17 16:09 sstur

@sstur Is there any update on this?

danielstecki avatar Jun 20 '18 21:06 danielstecki

We are having a 40s delay with a block of quote about 5kb size.... does anyone have a solution?

javascriptlove avatar Jun 29 '18 10:06 javascriptlove

Not sure of your use case, but I forked React-rte and switched from stateFromHTML in this library to convertFromHTML in draft-js and it fixed the issue of parsing with trailing spaces.

danielstecki avatar Jun 29 '18 12:06 danielstecki