mkdocs-exporter Displaying a code block in multiple pages in PDF

Hello, one of my functions the code block takes up more than one page, not showing the code. Captura de tela de 2024-10-11 17-59-33

Oct 11 '24 21:10 CarduCaldeira

Hello @CarduCaldeira,

Did you manage to solve the issue? If not, could you give me more details about your configuration (theme, plugins used...), please?

Thank you

Oct 29 '24 09:10 adrienbrignon

I encountered the same using Material for MkDocs. It seems that pre stopped rendering newlines after the page break. PyMdown Extensions code blocks use a two cell table to render line numbers (all line numbers in the first cell and all the content in the second); the issue is likely due to the PDF page break not carrying over all the CSS/properties. These issues also affect code blocks without line numbers that are rendered as only <div><pre><code>....

Below is my workaround. It hides the line numbers table cell and uses CSS and a custom attribute to add them back as ::before pseudo-elements. It also replaces all the newlines in the <code> node with <br> tags to compensate for them breaking after a page break. The padding-left: 1.2rem; CSS is to adjust for the amount of space allowed for the line numbers.

CSS:

@media screen {
 .highlight a[data-line-number]::before {
   display: none;
 }
}
@media print {
 /* Add the line number as a ::before pseudo element when using parse_code_linenums() */
 .highlight code a[data-line-number]::before {
   content: attr(data-line-number) !important;
   position: absolute;
   display: inline;
   right: -1em;
   visibility: visible;
   white-space: pre-line;
 }
 .highlight code a[data-line-number] + span:not(.hll),
 .highlight code a[data-line-number] + span.hll > span:first-child {
   padding-left: 1.2rem;
 }
 /*  Reinforce pre behaviors in case child elements are separated from `<pre>` by a page break */
 .highlight pre * {
   white-space-collapse: preserve !important;
   word-break: normal !important;
 }
}

JavaScript:

/**
 * An interface with the MkDocs Exporter plugin.
 */
window.MkDocsExporter = {

  /**
   * Render the page...
   */
  render: async () => {
    parse_code_linenums();
  }
};

// Removes the `linenos` table cell and adds the line numbers as `data-line-number` attribute to allow display using ::before instead
function parse_code_linenums() {
  const code_blocks = document.querySelectorAll('.highlight');

  code_blocks.forEach(code_block => {
    const codeLinks = code_block.querySelectorAll('a[id^="__codelineno"]');
    const linenoLinks = code_block.querySelectorAll('.linenos a');

    // Create a map of href to link text for the .linenos links within the current table
    const linenoMap = {};
    linenoLinks.forEach(link => {
      const href = link.getAttribute('href');
      const text = link.textContent.trim();
      linenoMap[href] = text;
    });

    const lines_cell = code_block.querySelector('.linenos');
    // replace newlines in code block with `<br>` to fix newlines breaking after a page break
    if (lines_cell) {
      code = code_block.querySelector('.code code');
    } else {
      code = code_block.querySelector('code')
    }
    if (code) {
      replaceNewlinesWithBr(code);
    }

    let maxLength = 1;
    codeLinks.forEach(link => {
      const href = `#${link.id}`; // Create href to match the link id

      if (linenoLinks.length) {
        if (linenoMap[href]) { // Check if corresponding href exists in the same table's linenos
          link.style.position = 'relative'; // Positioning for the ::before element
          link.setAttribute('data-line-number', linenoMap[href]); // Set data attribute for line number
          maxLength = linenoMap[href].length; // save the length of the last number
        }
        else {
          link.setAttribute('data-line-number', ''); // Add a blank attribute to let CSS know when there are line numbers
        }
      }
    });
    codeLinks.forEach(link => {
      // find the span element after the line number link to pad the spacing for the added numbers
      sibling = link.nextElementSibling;
      if (sibling) {
        if (sibling.classList.contains('hll')) {
          sibling = sibling.firstElementChild; // target the content span, not the highlighting
        }
      }
    });
  });
}

// replace all newline characters in `element`'s text nodes with `<br>` tags
function replaceNewlinesWithBr(element) {
  // Get all child nodes of the element
  const childNodes = Array.from(element.childNodes);

  // Iterate over each child node
  childNodes.forEach(node => {
    if (node.nodeType === Node.TEXT_NODE) {
      // Split the text content by newline characters
      const textParts = node.nodeValue.split('\n');

      // Create a document fragment to hold new nodes
      const fragment = document.createDocumentFragment();

      textParts.forEach((part, index) => {
        // Create a text node for the part
        const textNode = document.createTextNode(part);
        fragment.appendChild(textNode); // Append the text node

        // If this is not the last part, add a <br>
        if (index < textParts.length - 1) {
          const br = document.createElement('br');
          fragment.appendChild(br);
        }
      });

      // Replace the original text node with the new content
      node.parentNode.replaceChild(fragment, node);
    }
  });
}

Dec 04 '24 16:12 nbanyan

Another related issue with code blocks that cross page breaks is that the code content will always try to start at the top of the next page, leaving the title (filename) element floating alone at the end of the previous page. Strangely, this element resists changes from JavaScript, so a Python hook is needed to implement a workaround.

def _moveCodeFilename(parsed_html: BeautifulSoup):
    """
    Moves the title (filename) of code blocks into the <code> section so it may be rendered closer to the code block

    :param parsed_html: The parsed HTML to process.
    :type parsed_html: BeautifulSoup
    :return: The altered HTML
    :rtype: BeautifulSoup
    """

    code_blocks = parsed_html.find_all(class_='highlight')

    for code_block in code_blocks:
        header_span = code_block.select_one('span.filename')
        pre_tag = code_block.select_one('.highlight > pre code')
        if not pre_tag:
            pre_tag = code_block.select_one('.highlight .code pre code')

        if header_span and pre_tag:
            header_span = copy.deepcopy(header_span)
            # Move the header into the pre block
            ex_header_span = header_span.extract()
            pre_tag.insert(0, ex_header_span)

    return parsed_html

def on_page_content(html: str,
                    page: Page,
                    config: MkDocsConfig,
                    files: Files) -> Union[str, None]:
    """
    The `page_content` event is called after the Markdown text is rendered to
    HTML (but before being passed to a template) and can be used to alter the
    HTML body of the page.

    Args:
        html: HTML rendered from Markdown source as string
        page: `mkdocs.structure.pages.Page` instance
        config: global configuration object
        files: global files collection

    Returns:
        HTML rendered from Markdown source as string
    """

    # Parse the HTML content
    parsed_html = BeautifulSoup(html, 'html.parser')
    parsed_html = _moveCodeFilename(parsed_html)

    # Return the modified HTML
    return str(parsed_html)

Since this affects the HTML of the actual website, we need CSS to control the rendering of the added Filename element:

@media screen {
  .highlight code .filename {
    display: none !important;
  }
}
@media print {
   .highlight > .filename,
  .highlight th {
    display: none !important;
  }
  .highlight .linenos {
    display: none;
  }

  .md-typeset *:not(h3) + h4 {
    margin-top: 1.5rem;
  .highlight code .filename {
    font-family: 'Classico URW T OT', 'Noto Serif JP', serif;
    font-size: .6rem;
    margin: 0;
  }
}

JavaScript to invoke parse_code_linenums (from my previous comment) on the website as well, so HTML, print and PDF can be handled equally.

$(document).ready(function() {
  parse_code_linenums();
});

Dec 09 '24 18:12 nbanyan