pell icon indicating copy to clipboard operation
pell copied to clipboard

Strip html tags when pasting foreign content.

Open vedtam opened this issue 6 years ago • 15 comments

Hi,

Some of my users often have problems understanding that if they copy text from a foreign website that comes with its own formatting.

Would it be possible to have an action that strips html tags when pasting content or a button to reset formatting?

Thanks, E.

vedtam avatar May 04 '18 07:05 vedtam

Refer #53

I am unaware of how to use it as a plugin, so I added an event listener to pell content editable div.

editor.addEventListener('paste', function (e) { e.stopPropagation(); e.preventDefault(); var clipboardData = e.clipboardData || window.clipboardData; window.pell.exec('insertText', clipboardData.getData('Text')); return true; });

akhilharihar avatar May 17 '18 12:05 akhilharihar

Cool, thanks man!

vedtam avatar May 18 '18 07:05 vedtam

Update: The above snippet resulted in unexpected line breaks on backspace and return after pasting data from a foreign website.

Gif describing the issue - https://gfycat.com/IllustriousLeanAmericanbobtail

A quick google search landed me to a SOF question - https://stackoverflow.com/questions/42920985/textcontent-without-spaces-from-formatting-text

I implemented it by creating a temp element as below

var clipboardData = e.clipboardData || window.clipboardData; var TeampEl = document.createElement('div'); TeampEl.innerHTML = clipboardData.getData('text/html'); var text = TeampEl.textContent; window.pell.exec('insertText', text.replace(/[\n\r]+|[\s]{2,}/g, ' ').trim()); TeampEl.remove(); return true;

This one works as expected for my usage(Email), but the user need to reformat the content as line breaks are also removed.

akhilharihar avatar May 18 '18 09:05 akhilharihar

@akhilharihar Pasting multiple lines of text crashes safari for me with this handler. And chrome throws warnings.

oldboyxx avatar May 19 '18 22:05 oldboyxx

@oldboyxx I have a Windows PC that is managed by admin. So cannot test in safari or chrome.can you provide more details of the error thrown in chrome?

akhilharihar avatar May 20 '18 09:05 akhilharihar

@akhilharihar thanks for the followup! I implemented your second snippet, it works, there is a warning on paste but I can live with it:

screen shot 2018-06-14 at 20 05 28

Thanks!

vedtam avatar Jun 14 '18 17:06 vedtam

@vedtam I got the same we don't execute 'cause it's called recursively warning when I'm trying to insertText or insertHTML an illegal html snippet.

Check the html snippet you're about to paste, make sure the html tags are pairing.

I found out that the foreign html are formated with style attributes, so all I need to do is removing attributes. In electron i did this

onfocus = () => {
    const HTML = clipboard.readHTML()
    const parsedHTML = HTML.replace(/<[^/].*?>/g, i => i.split(/[ >]/g)[0] + '>').trim()
    // <div class="xx" style="xx"> to <div>
    clipboard.writeHTML(parsedHTML)
}

P.S. forgive my poor English and regex =P

gaoryrt avatar Jul 05 '18 02:07 gaoryrt

https://www.npmjs.com/package/striptags

import pell from 'pell';
import striptags  from 'striptags';

class Wysiwyg {
  constructor (element) {
  // ...
  const pellContent = this._element.querySelector('.pell-content');
  pellContent.addEventListener('paste', (e) => {
      setTimeout(() => {
        this._editor.content.innerHTML = this._editor.content.innerHTML.replace(/h[1-6]/ig, 'h2');
        this._editor.content.innerHTML = striptags(this._editor.content.innerHTML, ['h2', 'p', 'br', 'ul', 'ol', 'li']);
        this._editor.content.innerHTML = this._editor.content.innerHTML.replace(/(<[^>]+) class=".*?"/ig, '$1');
        this._editor.content.innerHTML = this._editor.content.innerHTML.replace(/(<[^>]+) id=".*?"/ig, '$1');
        this._editor.content.innerHTML = this._editor.content.innerHTML.replace(/(<[^>]+) style=".*?"/ig, '$1');
        this._store.value = this._editor.content.innerHTML;
      })
    });

salines avatar Feb 07 '19 09:02 salines

Easy way to do this, using some code from @salines

import striptags  from 'striptags';
import { exec } from 'pell';

pellElement.onpaste = function(event) {
    event.stopPropagation();
    event.preventDefault();

    const clipboardData = event.clipboardData || window.clipboardData;
    let pastedData = clipboardData.getData('Text');

    pastedData = striptags(pastedData, ['p', 'br', 'ul', 'ol', 'li']); // remove all html except the listed tags

    pastedData = pastedData.replace(/(class=["|'].*["|'])/ig, ''); // Remove classes
    pastedData = pastedData.replace(/(id=["|'].*["|'])/ig, ''); // Remove ids
    pastedData = pastedData.replace(/(style=["|'].*["|'])/ig, ''); // Remove inline styles
    pastedData = pastedData.replace(/\s\B/ig, ''); // Remove non linebreaking whitespace

    exec('insertHTML', pastedData);
}

Duder-onomy avatar Jun 10 '19 18:06 Duder-onomy

I have a problem with the output when using @Duder-onomy code above, so here's a tweaked version with the attributes clean-up done with removeAttribute.

import striptags  from 'striptags';
import { exec } from 'pell';

pellInstance.onpaste = function(event) {
  event.stopPropagation()
  event.preventDefault()

  const clipboardData = event.clipboardData || window.clipboardData
  let pastedData = clipboardData.getData('text/html')

  pastedData = striptags(pastedData, ['h3', 'h2', 'h1', 'p', 'br', 'ul', 'ol', 'li']) // remove all html except the listed tags

  let wrapper = document.createElement('div')
  wrapper.innerHTML = pastedData

  let allChildren = wrapper.getElementsByTagName('*')
  for (let index = 0; index < allChildren.length; index++) {
    const element = allChildren[index]
    element.removeAttribute('id')
    element.removeAttribute('class')
    element.removeAttribute('style')
  }

  pastedData = wrapper.innerHTML

  exec('insertHTML', pastedData)
}

vasilenka avatar Jan 22 '20 18:01 vasilenka

I think it's worth considering adding striptags as a dependency in pell. This seems like an important enough feature for a editor, that it might be worth slightly breaking the minimalist goal of this library. I bet there is a clean way that pell could set up striptags to be configured. Just an idea. Thanks for pell!

wgwz avatar Feb 04 '20 19:02 wgwz

Please note that the snippet @vasilenka supplied (thank you!) will prevent text/plain content from being pasted. Here is a modified version that allows both html and plain text pasting:

  editor.onpaste = function(event) {                                                                                                                                                                                                          
    event.stopPropagation()                                                                                                                                                                                                                   
    event.preventDefault()                                                                                                                                                                                                                    
                                                                                                                                                                                                                                              
    const clipboardData = event.clipboardData || window.clipboardData                                                                                                                                                                         
    let pastedData = clipboardData.getData('text/html')                                                                                                                                                                                       
    if (pastedData === '') {                                                                                                                                                                                                                  
        pastedData = clipboardData.getData('text/plain')                                                                                                                                                                                      
    }                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                              
    var tagWhiteList = ['h3', 'h2', 'h1', 'p', 'br', 'ul', 'ol', 'li', 'a'];                                                                                                                                                                  
    pastedData = striptags(pastedData, tagWhiteList) // remove all html except the listed tags                                                                                                                                                
                                                                                                                                                                                                                                              
    let wrapper = document.createElement('div')                                                                                                                                                                                               
    wrapper.innerHTML = pastedData                                                                                                                                                                                                            
                                                                                                                                                                                                                                              
    let allChildren = wrapper.getElementsByTagName('*')                                                                                                                                                                                       
    for (let index = 0; index < allChildren.length; index++) {                                                                                                                                                                                
      const element = allChildren[index]                                                                                                                                                                                                      
      element.removeAttribute('id')                                                                                                                                                                                                           
      element.removeAttribute('class')                                                                                                                                                                                                        
      element.removeAttribute('style')                                                                                                                                                                                                        
    }                                                                                                                                                                                                                                         
                                                                                                                                                                                                                                              
    pastedData = wrapper.innerHTML                                                                                                                                                                                                            
                                                                                                                                                                                                                                              
    exec('insertHTML', pastedData)                                                                                                                                                                                                            
  } 

Reference: https://developer.mozilla.org/en-US/docs/Web/API/DataTransfer/getData

wgwz avatar Feb 04 '20 22:02 wgwz

Hey! @wgwz thanks so much for the updated snippet, it works like a dream 👍 However, I keep getting this warning in the console (only in Chrome) - We don't execute document.execCommand() this time, because it is called recursively. - after pasting in the pell editor. I was wondering if this is something I should be worried about? Many thanks!

gkuodyte avatar Feb 07 '20 12:02 gkuodyte

@gkuodyte You may have to google search about that warning, likely caused by something else.

jaredreich avatar Feb 07 '20 22:02 jaredreich

Just some other notes to go along with the snippet I shared:

  • be aware that pasting from certain places i.e. google docs will appear broken with my snippet. you'll need to add elements to tagWhiteList and also change the attributes you remove, for this to work.

wgwz avatar Feb 16 '20 00:02 wgwz