go-weasyprint icon indicating copy to clipboard operation
go-weasyprint copied to clipboard

Fix font handling

Open PylotLight opened this issue 1 year ago • 17 comments

I never used to have to deal with fonts in python, so not sure why I'm being forced to define all this stuff I don't want to deal with here.

fs, err := fc.LoadFontsetFile(fontmapCache) fontconfig := text.NewFontConfigurationPango(fcfonts.NewFontMap(fc.Standard.Copy(), fs))

All I want is a simple html to pdf here from string to file. err := pdf.HtmlToPdf(os.Stdout, utils.InputString(html), fs)

PylotLight avatar Jul 05 '24 04:07 PylotLight

Your snippet is almost correct, just pass fontconfig instead of fs in HtmlToPdf

The python implementation uses C dependencies to handle fonts. This module uses a pure Go implementation which uses an on-disk cache to store font information. We have chosen to expose the path to the font cache.

I'm working towards enabling go-text as a replacement for the text engine, so that the FontConfiguration creation will slightly change in the future. The reference to fcfonts.NewFontMap and fc.Standard will not be needed anymore.

benoitkugler avatar Jul 05 '24 07:07 benoitkugler

I don't have the fontmapCache file present so I can't get this example to work currently. Copying the test file exactly gives:

	var fontconfig text.FontConfiguration
	const fontmapCache = "pdf/test/cache.fc"
	fs, _ := fc.LoadFontsetFile(fontmapCache)
	fontconfig = text.NewFontConfigurationPango(fcfonts.NewFontMap(fc.Standard.Copy(), fs))
	err := goweasyprint.HtmlToPdf(os.Stdout, utils.InputString(html), fontconfig)

panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xff864a]

goroutine 1 [running]: github.com/benoitkugler/textprocessing/pango.(*GlyphString).fallbackShape(0xc0008157c0, {0xc00013a0b0, 0x2c, 0x2c}, 0xc000694ff0) /home/User/go/pkg/mod/github.com/benoitkugler/[email protected]/pango/glyphs.go:213 +0x14a

PylotLight avatar Jul 08 '24 00:07 PylotLight

See the file pdf/draw_test.go and the snippet :

// this command has to run once
fmt.Println("Scanning fonts...")
_, err := fc.ScanAndCache(fontmapCache)
if err != nil {
	log.Fatal(err)
}

fs, err := fc.LoadFontsetFile(fontmapCache)
if err != nil {
log.Fatal(err)
}
fontconfig = text.NewFontConfigurationPango(fcfonts.NewFontMap(fc.Standard.Copy(), fs))

benoitkugler avatar Jul 08 '24 14:07 benoitkugler

2024/07/09 09:44:17 invalid font dir /usr/share/texmf/fonts/opentype/public stat /usr/share/texmf/fonts/opentype/public: no such file or directory

Ye this font stuff is just really not working for me. Perhaps I just wait for the replacement of these parts ;p I got go-wkhtmltopdf working for now, but I'll come back to this one if I can ever get it working.

PylotLight avatar Jul 08 '24 23:07 PylotLight

The error message is just a warning, it shouldn't fatal. What is the error returned by fc.ScanAndCache ?

benoitkugler avatar Jul 09 '24 07:07 benoitkugler

Full example

func main() {
	html := ""

	var fontconfig text.FontConfiguration
	const fontmapCache = "pdf/test/cache.fc"
	fmt.Println("Scanning fonts...")
	_, err := fc.ScanAndCache(fontmapCache)
	if err != nil {
		log.Fatal(err)
	}

	fs, err := fc.LoadFontsetFile(fontmapCache)
	if err != nil {
		log.Fatal(err)
	}
	fontconfig = text.NewFontConfigurationPango(fcfonts.NewFontMap(fc.Standard.Copy(), fs))
	err = goweasyprint.HtmlToPdf(os.Stdout, utils.InputString(html), fontconfig)
}
Scanning fonts...
2024/07/09 21:59:38 invalid font dir /usr/share/texmf/fonts/opentype/public stat 
/usr/share/texmf/fonts/opentype/public: no such file or directory
2024/07/09 21:59:39 open pdf/test/cache.fc: no such file or directory
exit status 1

PylotLight avatar Jul 09 '24 12:07 PylotLight

Thank you for the full example.There is something strange though : only one fatal log should happen (since log.Fatal exit the program). Could you be even more specific and print all the errors ? (That is add fmt.Println(err)) )

benoitkugler avatar Jul 09 '24 14:07 benoitkugler

That was with empty html string, this link has a sample page in it. https://pastecode.dev/s/twxqkbfe

Scanning fonts...
2024/07/10 00:42:37 invalid font dir /usr/share/texmf/fonts/opentype/public stat /usr/share/texmf/fonts/opentype/public: no such file or directory
open pdf/test/cache.fc: no such file or directory
loading font set: open pdf/test/cache.fc: no such file or directory
webrender.progress: 2024/07/10 00:42:40 Step 1 - Fetching and parsing HTML
webrender.progress: 2024/07/10 00:42:40 Step 3 - Applying CSS - 1 sheet(s)
webrender.progress: 2024/07/10 00:42:40 Step 4 - Creating formatting structure
webrender.progress: 2024/07/10 00:42:40 Step 5 - Creating layout - Page 1
webrender.progress: 2024/07/10 00:42:40 Step 6 - Drawing pages
webrender.progress: 2024/07/10 00:42:40 Step 7 - Adding PDF metadata
%PDF-1.7
%����
4 0 obj
<</DecodeParms [ null ] /Filter [/FlateDecode] /Length 69 >>
stream
���� C��W��sfX��5���oʲ~SV=sT��׹��dY{ɲ�%4!��4M�|����
endstream
endobj
3 0 obj
<<
/Type/Page
/Parent 2 0 R
/MediaBox [0 0 595.27563 841.88983]
/BleedBox [0 0 595.27563 841.88983]
/TrimBox [0 0 595.27563 841.88983]
/Contents [4 0 R]
>>
endobj
2 0 obj
<</Type/Pages/Count 1/Kids [3 0 R]>>
endobj
1 0 obj
<<
/Type/Catalog
/Pages 2 0 R
>>
endobj
5 0 obj
<<
/Producer (Go-WebRender 0.59)
>>
endobj
xref
0 6
0000000000 65535 f 
0000000401 00000 n 
0000000349 00000 n 
0000000178 00000 n 
0000000015 00000 n 
0000000449 00000 n 
trailer
<<
/Size 6
/Root 1 0 R
/Info 5 0 R
>>
startxref
500

PylotLight avatar Jul 09 '24 14:07 PylotLight

Could you add the exact Go sample you use ? It still don't get why the program does not exit at the first log.Fatal.

benoitkugler avatar Jul 13 '24 15:07 benoitkugler

Could you add the exact Go sample you use ? It still don't get why the program does not exit at the first log.Fatal.

package main

import (
	"fmt"
	"os"

	goweasyprint "github.com/benoitkugler/go-weasyprint"
	fc "github.com/benoitkugler/textprocessing/fontconfig"
	"github.com/benoitkugler/textprocessing/pango/fcfonts"
	"github.com/benoitkugler/webrender/text"
	"github.com/benoitkugler/webrender/utils"
)

func main() {
	html := `<html lang="en">
  <head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <meta http-equiv="X-UA-Compatible" content="ie=edge">
    <title>My Website</title>
  </head>
  <body>
    <main>
        <h1>Welcome to My Website</h1>  
    </main>
  </body>
</html>
`

	var fontconfig text.FontConfiguration
	const fontmapCache = "pdf/test/cache.fc"
	fmt.Println("Scanning fonts...")
	_, err := fc.ScanAndCache(fontmapCache)
	if err != nil {
		fmt.Println(err.Error())
	}

	fs, err := fc.LoadFontsetFile(fontmapCache)
	if err != nil {
		fmt.Println(err.Error())
	}
	fontconfig = text.NewFontConfigurationPango(fcfonts.NewFontMap(fc.Standard.Copy(), fs))
	err = goweasyprint.HtmlToPdf(os.Stdout, utils.InputString(html), fontconfig)
	if err != nil {
		fmt.Println(err.Error())
	}
}

PylotLight avatar Jul 14 '24 03:07 PylotLight

The issue is here :

_, err := fc.ScanAndCache(fontmapCache)
if err != nil {
    fmt.Println(err.Error())
}

I think you don't have the proper directories to match the font cache file defined as const fontmapCache = "pdf/test/cache.fc"

Could you adjust this constant to something like <a directory I own/cache.fc> or maybe simply cache.fc ? Thank you.

benoitkugler avatar Jul 14 '24 14:07 benoitkugler

~~but i dont have that file, and there would be no reason to given it was never explained in any doc anywhere?~~ nvm it might be working, ima test it at work in the morning.

PylotLight avatar Jul 14 '24 14:07 PylotLight

Alrighty it wrote my file, but didn't process the inline css inside the string like wkhtml does.

	// weasyprint
	var fontconfig text.FontConfiguration
	const fontmapCache = "cache.fc"
	fmt.Println("Scanning fonts...")
	_, err = fc.ScanAndCache(fontmapCache)
	if err != nil {
		return err
	}

	fs, err := fc.LoadFontsetFile(fontmapCache)
	if err != nil {
		return err
	}
	fontconfig = text.NewFontConfigurationPango(fcfonts.NewFontMap(fc.Standard.Copy(), fs))
	file, err := os.Create(filename)
	if err != nil {
		return err
	}
	err = goweasyprint.HtmlToPdf(file, utils.InputString(buf.String()), fontconfig)
	if err != nil {
		return err
	}
	// wkhtml
	pdfg, err := wkhtmltopdf.NewPDFGenerator()
	if err != nil {
		log.Fatal(err)
	}
	pdfg.AddPage(wkhtmltopdf.NewPageReader(strings.NewReader(buf.String())))
	err = pdfg.Create()
	if err != nil {
		log.Fatal(err)
	}
	err = pdfg.WriteFile(filename)
	if err != nil {
		log.Fatal(err)
	}

PylotLight avatar Jul 15 '24 05:07 PylotLight

Can you post the exact html string you use ? I didn't grasp which CSS you are refering to.

benoitkugler avatar Jul 15 '24 17:07 benoitkugler

Can you post the exact html string you use ? I didn't grasp which CSS you are refering to.

https://paste.ofcode.org/iCum4BQTjKeWcQkhexMVJp

PylotLight avatar Jul 15 '24 23:07 PylotLight

Thank you. What is the CSS not processed by GoWeasyprint ?

benoitkugler avatar Jul 16 '24 16:07 benoitkugler

PDF result: AU_PRD_RITM17270697.pdf

Wkhtml from same string generates correct coloring on each cell, and content. it just doesn't load properly for some reason.

PylotLight avatar Jul 17 '24 02:07 PylotLight