clipboard2markdown icon indicating copy to clipboard operation
clipboard2markdown copied to clipboard

output has hard spaces which are rejected in pandoc

Open colcord opened this issue 6 years ago • 6 comments

Hi, I really appreciate this project. I've just had some trouble when using the output. When I put my text files through pandoc, I get a complaint about hard spaces which have been inserted. I think it's latex which is rejecting them. Let me know if you want more details. Some representations of the spaces are \20.

colcord avatar Mar 27 '18 14:03 colcord

I'm not sure what the cause of this issue could be. It may be the text in the clipboard has these characters in which are respected by the markdown conversion. It could also be the library I used to actually convert the HTML to markdown.

Someone previously contributed a patch that added a lot of support for pandoc and I notice there are a lot of replacements done on the stream. Perhaps you'd like to add one that replaces this \20 with a space in a pull request?

euangoddard avatar Mar 27 '18 14:03 euangoddard

Wow. What a fast response. I'm not a programmer, so I can't write a patch, unfortunately. Let me find another example, and post it here, just so we have a clear test. thanks for your prompt response.

colcord avatar Mar 27 '18 15:03 colcord

Hi, I've tested this again. This is the latest web page which caused problems:

https://www.quora.com/What-is-the-best-textbook-for-Category-theory

thanks, Frank

colcord avatar Mar 28 '18 19:03 colcord

HI Frank,

I'll see what I can do. This project is pretty much unmaintained so I'll need to find some spare time to look into this. I'll see what I can do

euangoddard avatar Mar 29 '18 07:03 euangoddard

Hi Euan, just noticed another related bug. When a text as italics, the space before the first asterisk is a hard space. I've just looked at the javascript, and I don't see where it returns a single asterisk in replacement for italics. I see that it would return an underscore. But I haven't seen that in my results. Is most of the conversion using to-markdown? When I look around, I see that Dom Christie has updated his project to-markdown to turndown https://github.com/domchristie/turndown It looks as if he is maintaining it. I don't see a project which uses that code in a manner which is as easy to use as yours. I wish I could make the changes myself. kind regards, Frank

{ filter: ['em', 'i'], replacement: function (content) { return '' + content + '' } },

colcord avatar Apr 01 '18 13:04 colcord

@colcord, try replacing this part:

              .replace(/[ ]+\n/g, '\n')

with:

              .replace(/[ ]+\n/g, '\n')
              .replace(/\u00a0/g, ' ')

epsil avatar Apr 03 '18 07:04 epsil