google-docs-to-markdown icon indicating copy to clipboard operation
google-docs-to-markdown copied to clipboard

Treat two adjacent paragraphs where both paragraphs have zero margin as two lines

Open tamashalasi opened this issue 2 years ago • 1 comments

By default, Google Docs gives paragraphs a margin of 0, meaning that two adjacent paragraphs look like as if they had only a line break between them.

Current behavior: current behavior, p with zero margin is turned into a paragraph Corresponding markup: corresponding markup

Expected behavior: expected behavior, what looks like a line break should probably be a linebreak

So: If a paragraph has margin-bottom:0 and the next element is also a paragraph having margin-top:0, then the two elements should be merged, and a linebreak should be inserted in place of their previous border.

tamashalasi avatar Sep 30 '23 18:09 tamashalasi

This is actually intentional; line breaks (shift + enter in GDocs) are supported and convert to the kind of line breaks you’re looking for in Markdown, while hard breaks will always be new paragraphs. It’s not perfect, but I have seen plenty of people format documents in the wild with no blank lines between paragraphs and chose to make the distinction this way. I’m a little skeptical of solutions here that prize one person’s style of formatting and maybe cause problems for others.

That said, checking margin-* properties is interesting — I don’t recall them get reliably getting used this way when I first wrote this tool! That might be a good solution. I think we’d also need to check text-indent for documents that do paragraphs with indents or outdents on the first line, as well as looking for paragraphs starting with tab stops in order to manually indent the first line like this doc: https://docs.google.com/document/d/14ReHI9Z12kaywaUPJK5ofq308e3eW1PS6GLDTi3w4oc/

Screensho of Google doc where paragraphs are differentiated only by indenting/outdenting the first line

That then comes through with markup like:

<p style="text-indent: 36pt; margin-top: 0pt; margin-bottom: 0pt;">
    <span style="...">
        Paragraphs with the first line indented like this need to come off as separate paragraphs and not just line breaks.
    </span>
</p>
<p style="text-indent: 36pt; margin-top: 0pt; margin-bottom: 0pt;">
    <span style="...">
        This should be a new paragraph and so on and forth blahbety blahbety blah blah blah this is a long line line.
    </span>
</p>

<br />

<p style="text-indent: -36pt; margin-top: 0pt; margin-bottom: 0pt; padding: 0pt 0pt 0pt 36pt;">
    <span style="...">
        And similarly for outdenting like this — also separate paragraphs and not just line breaks blah blah blah blah blah.
    </span>
</p>
<p style="text-indent: -36pt; margin-top: 0pt; margin-bottom: 0pt; padding: 0pt 0pt 0pt 36pt;">
    <span style="...">
        This should be a new paragraph and so on and forth blahbety blahbety blah blah blah this is a long line line.
    </span>
</p>

<br />

<p dir="ltr" style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;">
    <span style="...">
        <span class="Apple-tab-span" style="white-space: pre;">	<!-- note this is an actual tab character, not a space --></span>
    </span>
    <span style="...">
        Also paragraphs with the first line indented manually with a tab stop like this one, should be separate as well.
    </span>
</p>
<p dir="ltr" style="line-height: 1.38; margin-top: 0pt; margin-bottom: 0pt;">
    <span style="...">
        <span class="Apple-tab-span" style="white-space: pre;">	<!-- note this is an actual tab character, not a space --></span>
    </span>
    <span style="...">
        This should be a new paragraph because it’s formatted the same as above, with a manual tab stop at the start of the line.
    </span>
</p>

I assume the manual tab stops (<span class="Apple-tab-span" style="white-space: pre;">) are going to look different on Windows and Linux, so that probably needs some more testing. I’m surprised it seems the same across browsers on macOS.

Mr0grog avatar Oct 02 '23 06:10 Mr0grog