turndown icon indicating copy to clipboard operation
turndown copied to clipboard

Whitespace around links

Open jtsylve opened this issue 6 years ago • 7 comments

When converting text that contains HTML links the Markdown links do not have spaces before or after then, causing them to be concatenated to the adjoining words.

jtsylve avatar Jun 23 '18 05:06 jtsylve

I'm afraid I can't reproduce the issue you describe. Would you be able to give an example of the HTML you are trying to convert?

Thanks

domchristie avatar Jun 23 '18 09:06 domchristie

Sure

<meta charset='utf-8'><h3 style="box-sizing: border-box; font-weight: 600; margin: 24px 0px 16px; color: rgb(17, 17, 17); padding-bottom: 0rem; line-height: 1.25; font-size: 1.25rem; letter-spacing: -0.03rem; font-family: &quot;Source Sans Pro&quot;, &quot;Lucida Grande&quot;, sans-serif; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">See the<span> </span><a href="https://github.com/domchristie/to-markdown/wiki/Migrating-from-to-markdown-to-Turndown" rel="nofollow" style="background-color: transparent; box-sizing: border-box; color: rgb(203, 56, 55); text-decoration: none; font-size: 1em; font-weight: 700;">migration guide</a><span> </span>for details</h3>

Now that I look at it, it likely has more to do with not rendering the spaces from the empty span tags that are rendered before and after the link. This code is weird because it's generated by the clipboard. I've got a codemirror editor and I'm converted pasted text to markdown

this.editor.on('paste', (cm, e) => {
    let html = e.clipboardData.getData('text/html');

    if (html === "") {
        this.setState({ pasted: null })
    } else {
        this.setState({ pasted: this.htmlToMarkdown.turndown(html) })
    }
})

jtsylve avatar Jun 23 '18 16:06 jtsylve

When converting text that contains HTML links the Markdown links do not have spaces before or after then, causing them to be concatenated to the adjoining words.

Just to clarify, are you seeing too much white space around those links, or no whitespace?

When I convert the following:

<meta charset='utf-8'><h3 style="box-sizing: border-box; font-weight: 600; margin: 24px 0px 16px; color: rgb(17, 17, 17); padding-bottom: 0rem; line-height: 1.25; font-size: 1.25rem; letter-spacing: -0.03rem; font-family: &quot;Source Sans Pro&quot;, &quot;Lucida Grande&quot;, sans-serif; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">See the<span> </span><a href="https://github.com/domchristie/to-markdown/wiki/Migrating-from-to-markdown-to-Turndown" rel="nofollow" style="background-color: transparent; box-sizing: border-box; color: rgb(203, 56, 55); text-decoration: none; font-size: 1em; font-weight: 700;">migration guide</a><span> </span>for details</h3>

I get:

### See the  [migration guide](https://github.com/domchristie/to-markdown/wiki/Migrating-from-to-markdown-to-Turndown)  for details

There are two spaces instead of the expected one, which is a bug, but I'm not seeing words being concatenated.

domchristie avatar Jun 25 '18 21:06 domchristie

Odd, I'm seeing the links concatenated with the words around it

On Mon, Jun 25, 2018, 16:39 Dom Christie [email protected] wrote:

When converting text that contains HTML links the Markdown links do not have spaces before or after then, causing them to be concatenated to the adjoining words.

Just to clarify, are you seeing too much white space around those links, or no whitespace?

When I convert the following:

See the migration guide for details

I get:

See the migration guide for details

There are two spaces instead of the expected one, which is a bug, but I'm not seeing words being concatenated.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/domchristie/turndown/issues/237#issuecomment-400104570, or mute the thread https://github.com/notifications/unsubscribe-auth/AAkQ9WzwV0iQYTBdhdZdAgoGxpBQQkq_ks5uAViXgaJpZM4U0pth .

jtsylve avatar Jun 25 '18 23:06 jtsylve

What happens when you paste in:

<meta charset='utf-8'><h3 style="box-sizing: border-box; font-weight: 600; margin: 24px 0px 16px; color: rgb(17, 17, 17); padding-bottom: 0rem; line-height: 1.25; font-size: 1.25rem; letter-spacing: -0.03rem; font-family: &quot;Source Sans Pro&quot;, &quot;Lucida Grande&quot;, sans-serif; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">See the<span> </span><a href="https://github.com/domchristie/to-markdown/wiki/Migrating-from-to-markdown-to-Turndown" rel="nofollow" style="background-color: transparent; box-sizing: border-box; color: rgb(203, 56, 55); text-decoration: none; font-size: 1em; font-weight: 700;">migration guide</a><span> </span>for details</h3>

into http://domchristie.github.io/turndown/ ?

How are you using Turndown: browser or node, and which version?

domchristie avatar Jun 26 '18 09:06 domchristie

I often saw whitespace problems around links and between some words when I export a heavily edited, commented, etc. doc from Google Docs, then convert to html with Turndown (browser version, which I prefer to do because I like to have links as referenced, for example). I think it's mostly because &nbsp; gets converted to no space, so I manually replace those with normal spaces before converting.

jlehtinen avatar Oct 26 '18 11:10 jlehtinen

not to necro this but I'm running into the exact same thing, and I suspect it's an eccentricity with rich text / the clipboard API rather than turndown: copying and pasting the body of, say, https://jmduke.com/, into a codemirror results in the lack of spaces before/after hyperlinks, whereas grabbing the view source is fine. Will try and track this down this weekend and update the issue!

jmduke avatar Feb 15 '19 00:02 jmduke