mdpdf icon indicating copy to clipboard operation
mdpdf copied to clipboard

Paragraph Breaking

Open mcqj opened this issue 5 years ago • 20 comments

Using version 1.7

mdpdf is treating separate lines in the markdown input as separate paragraphs. Lines of text that are not separated by a blank line should be treated as forming part of the same paragraph, according to the spec. New paragraphs should start after a blank line.

mcqj avatar Aug 16 '18 00:08 mcqj

Hello @mcqj, sorry for the slow reply I've been away camping for the week. What spec are you looking at? There are multiple markdown specs which often conflict in some places.

elliotblackburn avatar Aug 24 '18 15:08 elliotblackburn

No problem @BlueHatbRit and thanks for the reply.

I'm looking at GFM spec here: https://github.github.com/gfm/#paragraphs ]]

CommonMark seems to be the same https://spec.commonmark.org/0.28/#paragraphs

mcqj avatar Aug 27 '18 13:08 mcqj

Wonderful thank you, mdpdf uses the Showdown.js module to convert markdown into HTML, that follows the requirement of a double new line to signify a new line in markdown. I'll take a look and see what spec they're following, but this doesn't appear to be a bug from their perspective, or at least it would be a very obvious one if it were.

I'll have a dig around and see what I can find, but if it's not considered a bug in Showdown.js, then I expect for now mdpdf will follow that. Until we're able to implement our own md->html parser that is.

elliotblackburn avatar Aug 27 '18 18:08 elliotblackburn

Yeah okay Showdown is following the original "spec" (used loosely) by John Gruber, which has been derived from his original perl implementation, and that's what they attempt to stick closest to. This implementation seems to be correct:

A paragraph is simply one or more consecutive lines of text, separated by one or more blank lines.

So there must be one or more blank lines between two lines of text to denote a new paragraph split. For now this is the implementation we're going to go with, however if / when I ever get time to implement a new markdown->html parser (as an option) I will probably look at the github markdown spec, or common mark first.

This is of course open to PR's if anyone wishes to attempt it!

elliotblackburn avatar Aug 27 '18 18:08 elliotblackburn

@BlueHatbRit I'm OK with that but that is not the behaviour I am seeing. What I am seeing is that text separated by a new line (in other words zero blank) lines is resulting in a new paragraph.

Let me illustrate with some examples:-

The following input:-

This is a test with new line here
and another here
end.

produces this output:- screen shot 2018-08-28 at 12 30 59 while this input:-

This is a test with along single line. This is a test with along single line. This is a test with along single line. This is a test with along single line. This is a test with along single line. This is a test with along single line. This is a test with along single line. This is a test with along single line. This is a test with along single line. This is a test with along single line. This is a test with along single line. This is a test with along single line. This is a test with along single line. This is a test with along single line. This is a test with along single line. This is a test with along single line. This is a test with along single line. This is a test with along single line. And now a single line break HERE
and new line.

produces this output:- screen shot 2018-08-28 at 12 33 21

As you can see, line breaks produce new paragraphs without there being one or more blank lines as required by Gruber.

mcqj avatar Aug 28 '18 11:08 mcqj

Ah right I see what you're describing now, a picture can mean 1000 words I guess. Thanks for reporting this, this does not seem to lineup with showdown implementation. I'll check this out with our version of showdown, it might be a case of simply making sure we're on the latest version. Otherwise, this will most likely be a bug in showdown itself, though the websites demo version creates what we expect.

Thanks for providing that detailed example!

elliotblackburn avatar Aug 28 '18 11:08 elliotblackburn

Just wanted to weigh in and say that fixing these new lines after single line breaks is the one and only thing on my wishlist for mdpdf 👍

Because it's coming from the HTML generation, it doesn't really seem like something that can be patched with CSS tweaks, otherwise I'd drop in a style fix as placeholder.

Thanks for your hard work on this library!

owap avatar May 01 '19 15:05 owap

I'll add my support to allow us to write markdown properly in our text editor with a 70 chars line width but put all lines of a same paragraph on the same html line.

For now I replaced the flavor of showdown.setFlavor('github'); in index.js

lud avatar Jan 23 '20 17:01 lud

Hey all, thanks for keeping on this, I appreciate seeing what people want from the library and the thoughts and work towards a good solution are much appreciated.

@lud which flavour did you select, "vanilla"? I'm up for considering a change to the flavour, but I think I'd want to keep the default as github to avoid a breaking change. The reservation I have is that we're then asking the users to choose between several sets of features, neither of which seems to be ideal. We lose quite a bit when straying from the github flavour (checkboxes being one example).

The option would let people select what is more important to them, but I fear it'll turn into more issues simply as a result of several sets being supported. I'm not too worried about it, and I'd take the hit but I would want to hear some opinions on adding the flavour cli argument.

This whole thing makes me want to roll a home grown markdown converter to have a bit more control, but I really don't think that's something I have time to do, nor do I really have the capacity to maintain that long term. It's quite a big undertaking and showdown has been very good thus far.

elliotblackburn avatar Jan 23 '20 19:01 elliotblackburn

Hey :)

I personally do not care about the flavour as long as it treat single line breaks as mere whitespace, I assumed vanilla would have some improvements over the original one so I picked it.

I also kept gh styles : ./node_modules/.bin/mdpdf mydoc.md --style=style.css --gh-style

I don't understand why github decided to break html lines for each markdown line. There are many ways to write a line break (two space at EOL, a \ at EOL, or even a plain <br/>) but there are no ways to say "keep all in one line" with the current github syntax.

To me it looks like some github developer tried to be smartest that them are.

Anyway, I understand that providing too much option leads to inconsistency and to much combinations. I picked your project because it seemed fine (and it is !) but actually I could also use a simple html2pdf command. But I like the header/footer, style option and it just works out of the box.

I still think you could provide an option to pass the flavour to showdown. Yes it could lead to some inconsistencies but you can state that it is the user responsibility to make choices that are compatible, or stick to the defaults.

lud avatar Jan 23 '20 22:01 lud

You know, I'm not entirely sure that the linebreak in normal github markdown does force a line break see - https://gist.github.com/BlueHatbRit/0b9d580d2f7a025cefbcc4fbb20d8dbf (raw vs rendered). It looks like it may well be a bug in showdown which you could open a bug report for?

That said, offering the flavour option may not be a bad idea in general. People prefer may different implementations, my desire originally was for checklists. If we default to github that should be fine. I'm open to PR's for this, otherwise I'll get to it as soon as I have some time to hack on the project again.

elliotblackburn avatar Jan 24 '20 21:01 elliotblackburn

Well the behaviour is different than in the comments then !

But I remember Github changed the behaviour someday for READMEs, so it seems that they stiuck to it, and showdown did not follow. Maybe.

lud avatar Jan 24 '20 22:01 lud

Per this comment on an issue in showdown, I believe this can be fixed with by setting simpleLineBreaks to false. For example:

function parseMarkdownToHtml(markdown, convertEmojis, enableHighlight) {
  showdown.setFlavor('github');
  const options = {
    prefixHeaderId: false,
    ghCompatibleHeaderId: true,
    simpleLineBreaks: false,
    extensions: []
  };

I've tested locally and it appears to have the desired effect.

amlweems avatar May 14 '20 00:05 amlweems

Thanks for that amlweems, that looks like a good toggle, I didn't see that in the docs when I went through the first time.

I'm going to have a think about this over this week. On the one hand I don't want to break existing pdfs and force a major version bump. On the other hand, it seems like something that's not really worth a toggle on it's own. We could perhaps expose flavours and have a github and github-breaks one which activates this toggle.

elliotblackburn avatar May 19 '20 10:05 elliotblackburn

If that would me be, i'd turn convertEmojis and enableHighlight into an options object that also has a parserOptions property, and just forward this property to the parser without adding current options (like prefixHeaderId: false) as defaults as the user expects the parser defaults to be … defaults.

(Or just add parserOptions as a fourth argument but this is not futureproof).

So the user can do anything it wants with the parser.

And then bump major version.

lud avatar May 19 '20 10:05 lud

A flavour toggle seems like an excellent idea to me. I've got 70-chars large markdown paragraphs that end up filling only half the width of an A4 paper sheet because each newline has been converted to <br>.

I'm confident this is NOT the standard behaviour expected when rendering markdown.

A fix would make me very grateful.

Keksoj avatar Jun 09 '20 16:06 Keksoj

Looking forward to this myself. EDIT: simpleLineBreaks: false as reported above works perfectly.

mcg1969 avatar Mar 23 '21 14:03 mcg1969

Thanks for that amlweems, that looks like a good toggle, I didn't see that in the docs when I went through the first time.

I'm going to have a think about this over this week. On the one hand I don't want to break existing pdfs and force a major version bump. On the other hand, it seems like something that's not really worth a toggle on it's own. We could perhaps expose flavours and have a github and github-breaks one which activates this toggle.

Hi! Any updates on this? I thought the --gh-style flag would use simpleLineBreaks: false as that is what GH uses. I can probably submit a PR to fix this. It seems like an easy fix, I'd add a parameter to this function https://github.com/BlueHatbRit/mdpdf/blob/4c28f7e3e46661a041214356d8fe75ae2e3142d3/src/index.js#L51 to set the simpleLineBreaks option in the options object. I don't think there's a way to test it though

kito-inv avatar Aug 08 '22 19:08 kito-inv

No update I'm afraid, I've not had any time to look at this recently unfortunately. If that's the case then setting that option seems reasonable to me although it is a breaking change for the default settings so we'll want to roll it into a major version bump.

Would be happy to review a PR for this though and if all is well I can do the release.

elliotblackburn avatar Aug 13 '22 22:08 elliotblackburn

@BlueHatbRit I've submitted a PR to fix this. I can't add you as a reviewer so I'm just letting you know here

kito-inv avatar Aug 17 '22 19:08 kito-inv

Just a very humble nag to get this committed, please!

I use mdpdf to generate monolithic PDFs of embedded docs from commands (which have to be word wrapped manually) and the PDFs look "meh", but as the embedded docs are more important I have to leave those alone for now.

pgalbavy-itrs avatar Jun 29 '23 07:06 pgalbavy-itrs

@pgalbavy-itrs It's already fixed in this PR but @BlueHatbRit needs to review it. In the meantime, you could clone that branch and run it locally, I don't remember how to do so sadly.

kito-inv avatar Jun 29 '23 15:06 kito-inv

@pgalbavy-itrs It's already fixed in this PR but @BlueHatbRit needs to review it. In the meantime, you could clone that branch and run it locally, I don't remember how to do so sadly.

Thanks - but I'm using this in a docker node container to build the PDFs and then just pull them out. I don't know enough npm command line to change the source etc.

pgalbavy-itrs avatar Jun 29 '23 15:06 pgalbavy-itrs

I've merged the PR fix into master, it'll go out with the next release.

elliotblackburn avatar Dec 28 '23 16:12 elliotblackburn