pdf2json
pdf2json copied to clipboard
Not working with PDF output from chromium 136
With chromium 135 on Linux, I am able to save a simple PDF file then parse it with pdf2json. The content show up in each page's Texts property.
With the same simple PDF content saved from chromium 136, parsing with pdf2json results in all Texts being an empty array.
{
"Transcoder": "[email protected] [https://github.com/modesty/pdf2json]",
"Meta": {
"PDFFormatVersion": "1.4",
"IsAcroFormPresent": false,
"IsXFAPresent": false,
"Title": "tmp-34722-xETce7Wa4bRR-.html",
"Creator": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/136.0.0.0 Safari/537.36",
"Producer": "Skia/PDF m136",
"CreationDate": "D:20250519171424+00'00'",
"ModDate": "D:20250519171424+00'00'",
"Metadata": {}
},
"Pages": [
{
"Width": 38.25,
"Height": 49.5,
"HLines": [],
"VLines": [],
"Fills": [],
"Texts": [],
"Fields": [],
"Boxsets": []
}
]
}
Here is the simple PDF saved from chromium 136.
If it helps, I ran with verbosity set to high. On Windows where this still works, the logs are:
Warning: Setting up fake worker.
Info: PDF loaded. pagesCount = 1
Info: start to parse page:1
Info: Skipped: tiny fill: 0 x 0
Info: Success: Page 1
Info: complete parsing page:1
Info: PDF parsing completed.
On Linux where this has stopped working, the logs for the same PDF file are:
Warning: Setting up fake worker.
Info: PDF loaded. pagesCount = 1
Info: start to parse page:1
Info: Skipped: tiny fill: 0 x 0
Info: Skipped: tiny fill: 0.441 x 0.567
Info: Skipped: tiny fill: 0.506 x 0.567
Info: Skipped: tiny fill: 0.414 x 0.547
Info: Skipped: tiny fill: 0.463 x 0.547
Info: Skipped: tiny fill: 0.352 x 0.547
Info: Skipped: tiny fill: 0.414 x 0.547
Info: Skipped: tiny fill: 0.463 x 0.547
Info: Success: Page 1
Info: complete parsing page:1
Info: PDF parsing completed.
It appears that, for some reason, content is being skipped.