clipboard2markdown output has hard spaces which are rejected in pandoc

Hi, I really appreciate this project. I've just had some trouble when using the output. When I put my text files through pandoc, I get a complaint about hard spaces which have been inserted. I think it's latex which is rejecting them. Let me know if you want more details. Some representations of the spaces are \20.

Mar 27 '18 14:03 colcord

I'm not sure what the cause of this issue could be. It may be the text in the clipboard has these characters in which are respected by the markdown conversion. It could also be the library I used to actually convert the HTML to markdown.

Someone previously contributed a patch that added a lot of support for pandoc and I notice there are a lot of replacements done on the stream. Perhaps you'd like to add one that replaces this \20 with a space in a pull request?

Mar 27 '18 14:03 euangoddard

Wow. What a fast response. I'm not a programmer, so I can't write a patch, unfortunately. Let me find another example, and post it here, just so we have a clear test. thanks for your prompt response.

Mar 27 '18 15:03 colcord

Hi, I've tested this again. This is the latest web page which caused problems:

https://www.quora.com/What-is-the-best-textbook-for-Category-theory

thanks, Frank

Mar 28 '18 19:03 colcord

HI Frank,

I'll see what I can do. This project is pretty much unmaintained so I'll need to find some spare time to look into this. I'll see what I can do

Mar 29 '18 07:03 euangoddard

Hi Euan, just noticed another related bug. When a text as italics, the space before the first asterisk is a hard space. I've just looked at the javascript, and I don't see where it returns a single asterisk in replacement for italics. I see that it would return an underscore. But I haven't seen that in my results. Is most of the conversion using to-markdown? When I look around, I see that Dom Christie has updated his project to-markdown to turndown https://github.com/domchristie/turndown It looks as if he is maintaining it. I don't see a project which uses that code in a manner which is as easy to use as yours. I wish I could make the changes myself. kind regards, Frank

{ filter: ['em', 'i'], replacement: function (content) { return '' + content + '' } },

Apr 01 '18 13:04 colcord

@colcord, try replacing this part:

              .replace(/[ ]+\n/g, '\n')

with:

              .replace(/[ ]+\n/g, '\n')
              .replace(/\u00a0/g, ' ')

Apr 03 '18 07:04 epsil

clipboard2markdown clipboard2markdown copied to clipboard

output has hard spaces which are rejected in pandoc

clipboard2markdown
clipboard2markdown copied to clipboard