[Feature Request] Allow use of a custom PDF generator
I'm not really happy with the way the browser engines render PDFs from the markdown files. Code blocks get split up in pages, sometimes lines are split in the middle, etc. So I've created a custom LaTeX template for pandoc to do these conversions, instead of docfx. Nevertheless, docfx is great and provides a nice solution for .NET project documentation.
I'd love to be able to change the default executor for the PDF conversion from Chromium to a custom command, that takes specified arguments and builds a PDF that docfx then integrates seamlessly.
Currently, I've implemented a hacky solution, where I explictly link the PDF files in the tocs as child items, and call pandoc before running docfx. It works, but it's clunky.
@merlinschumacher, I'm curious about which custom PDF generator you prefer. Is it a proprietary tool? Docfx used to usewkhtmltopdf, but it hasn't been actively maintained. Is there an alternative you're using now?
I use pandoc in combination with a custom LaTeX template. That's essentially all. The LaTeX template is just a slightly modified version of pandoc's default template. And there are even popular templates like Eisvogel that are built for exactly the purpose of converting Markdown to PDF and looking good while at it.
Pandoc can also receive metadata, that are used in the resulting files. So I've been able to inject information like the build date of the files into the PDFs, using metadata and corresponding placeholders in the template.
Pandoc is available for all major platforms and the most common required decencies are as well. On Windows even via chocolatey or winget
There are also pandoc filters for plantuml and mermaid. But I didn't get around to check these out, yet.
At the moment I use a python script, that calls pandoc and docfx one after another inside a custom made docker image. The docker image is used in a CI/CD pipeline, where I generate the output. My setup relies on Inkscape for pandoc, which pulls a lot of dependencies, but I believe I can replace it with something smaller like rsvg-convert.
For the conversion from HTML to PDF pandoc relies on Weasyprint, which seems to support CSS as well, and it's said it has better support for print related CSS rules. But that one I didn't test.
You can use CSS to specify when page breaks happen.
See: https://www.w3schools.com/cssref/pr_print_pageba.php
you could create a page break element using css
@media print {
page { page-break-after: always;}
}
You can also add margins like this:
@media print {
@page {
margin: 1in 0.2in;
}
}
Note the @media print will only affect the PDF and printouts.
In my opinion keep using playwright for generating the PDF output is the best solution. Reason: The generated output looks at least very similar to the pages when viewing in browser. Using any other solution may result in a output which doesn't look like the HTML-pages anymore.
In my opinion keep using playwright for generating the PDF output is the best solution.
Best or not is subjective, the request is to allow for custom generators, not take away the current.
I use pandoc in combination with a custom LaTeX template. That's essentially all. The LaTeX template is just a slightly modified version of pandoc's default template. And there are even popular templates like Eisvogel that are built for exactly the purpose of converting Markdown to PDF and looking good while at it.
Pandoc can also receive metadata, that are used in the resulting files. So I've been able to inject information like the build date of the files into the PDFs, using metadata and corresponding placeholders in the template.
Pandoc is available for all major platforms and the most common required decencies are as well. On Windows even via chocolatey or winget
There are also pandoc filters for plantuml and mermaid. But I didn't get around to check these out, yet.
At the moment I use a python script, that calls pandoc and docfx one after another inside a custom made docker image. The docker image is used in a CI/CD pipeline, where I generate the output. My setup relies on Inkscape for pandoc, which pulls a lot of dependencies, but I believe I can replace it with something smaller like rsvg-convert.
For the conversion from HTML to PDF pandoc relies on Weasyprint, which seems to support CSS as well, and it's said it has better support for print related CSS rules. But that one I didn't test.
I have a similar request. I use xelatex as the PDF engine with pandoc. I believe docfx uses wkhtmltopdf which stitches html together and is not as feature rich as pandoc and xelatex. Is there a way to customize or replace wkhtmtopdf with a pdf engine for a better PDF output experience? Can it be done? If so, point me in this direction pls.
You can use CSS to specify when page breaks happen.
See: https://www.w3schools.com/cssref/pr_print_pageba.php
you could create a page break element using css
@media print { page { page-break-after: always;} } You can also add margins like this:
@media print { @page { margin: 1in 0.2in; } } Note the @media print will only affect the PDF and printouts.
@TheDevelolper In which file did you add this piece of code, and how is it recognized by DocFX? I’ve tried several approaches to include a custom CSS file to improve the PDF layout, but it doesn’t seem to take effect, only the modern template is applied. I added a main.css file into my-template/public folder in my project and added the path to the docfx.json file but without success. I haven’t found a way to override the default CSS settings.