turndown
turndown copied to clipboard
Whitespace around links
When converting text that contains HTML links the Markdown links do not have spaces before or after then, causing them to be concatenated to the adjoining words.
I'm afraid I can't reproduce the issue you describe. Would you be able to give an example of the HTML you are trying to convert?
Thanks
Sure
<meta charset='utf-8'><h3 style="box-sizing: border-box; font-weight: 600; margin: 24px 0px 16px; color: rgb(17, 17, 17); padding-bottom: 0rem; line-height: 1.25; font-size: 1.25rem; letter-spacing: -0.03rem; font-family: "Source Sans Pro", "Lucida Grande", sans-serif; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">See the<span> </span><a href="https://github.com/domchristie/to-markdown/wiki/Migrating-from-to-markdown-to-Turndown" rel="nofollow" style="background-color: transparent; box-sizing: border-box; color: rgb(203, 56, 55); text-decoration: none; font-size: 1em; font-weight: 700;">migration guide</a><span> </span>for details</h3>
Now that I look at it, it likely has more to do with not rendering the spaces from the empty span
tags that are rendered before and after the link. This code is weird because it's generated by the clipboard. I've got a codemirror editor and I'm converted pasted text to markdown
this.editor.on('paste', (cm, e) => {
let html = e.clipboardData.getData('text/html');
if (html === "") {
this.setState({ pasted: null })
} else {
this.setState({ pasted: this.htmlToMarkdown.turndown(html) })
}
})
When converting text that contains HTML links the Markdown links do not have spaces before or after then, causing them to be concatenated to the adjoining words.
Just to clarify, are you seeing too much white space around those links, or no whitespace?
When I convert the following:
<meta charset='utf-8'><h3 style="box-sizing: border-box; font-weight: 600; margin: 24px 0px 16px; color: rgb(17, 17, 17); padding-bottom: 0rem; line-height: 1.25; font-size: 1.25rem; letter-spacing: -0.03rem; font-family: "Source Sans Pro", "Lucida Grande", sans-serif; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">See the<span> </span><a href="https://github.com/domchristie/to-markdown/wiki/Migrating-from-to-markdown-to-Turndown" rel="nofollow" style="background-color: transparent; box-sizing: border-box; color: rgb(203, 56, 55); text-decoration: none; font-size: 1em; font-weight: 700;">migration guide</a><span> </span>for details</h3>
I get:
### See the [migration guide](https://github.com/domchristie/to-markdown/wiki/Migrating-from-to-markdown-to-Turndown) for details
There are two spaces instead of the expected one, which is a bug, but I'm not seeing words being concatenated.
Odd, I'm seeing the links concatenated with the words around it
On Mon, Jun 25, 2018, 16:39 Dom Christie [email protected] wrote:
When converting text that contains HTML links the Markdown links do not have spaces before or after then, causing them to be concatenated to the adjoining words.
Just to clarify, are you seeing too much white space around those links, or no whitespace?
When I convert the following:
See the migration guide for details
I get:
See the migration guide for details
There are two spaces instead of the expected one, which is a bug, but I'm not seeing words being concatenated.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/domchristie/turndown/issues/237#issuecomment-400104570, or mute the thread https://github.com/notifications/unsubscribe-auth/AAkQ9WzwV0iQYTBdhdZdAgoGxpBQQkq_ks5uAViXgaJpZM4U0pth .
What happens when you paste in:
<meta charset='utf-8'><h3 style="box-sizing: border-box; font-weight: 600; margin: 24px 0px 16px; color: rgb(17, 17, 17); padding-bottom: 0rem; line-height: 1.25; font-size: 1.25rem; letter-spacing: -0.03rem; font-family: "Source Sans Pro", "Lucida Grande", sans-serif; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">See the<span> </span><a href="https://github.com/domchristie/to-markdown/wiki/Migrating-from-to-markdown-to-Turndown" rel="nofollow" style="background-color: transparent; box-sizing: border-box; color: rgb(203, 56, 55); text-decoration: none; font-size: 1em; font-weight: 700;">migration guide</a><span> </span>for details</h3>
into http://domchristie.github.io/turndown/ ?
How are you using Turndown: browser or node, and which version?
I often saw whitespace problems around links and between some words when I export a heavily edited, commented, etc. doc from Google Docs, then convert to html with Turndown (browser version, which I prefer to do because I like to have links as referenced, for example). I think it's mostly because
gets converted to no space, so I manually replace those with normal spaces before converting.
not to necro this but I'm running into the exact same thing, and I suspect it's an eccentricity with rich text / the clipboard API rather than turndown: copying and pasting the body of, say, https://jmduke.com/, into a codemirror results in the lack of spaces before/after hyperlinks, whereas grabbing the view source is fine. Will try and track this down this weekend and update the issue!