Not working with PDF output from chromium 136

Open robross0606 opened this issue 7 months ago • 1 comments

With chromium 135 on Linux, I am able to save a simple PDF file then parse it with pdf2json. The content show up in each page's Texts property.

With the same simple PDF content saved from chromium 136, parsing with pdf2json results in all Texts being an empty array.

{
  "Transcoder": "[email protected] [https://github.com/modesty/pdf2json]",
  "Meta": {
    "PDFFormatVersion": "1.4",
    "IsAcroFormPresent": false,
    "IsXFAPresent": false,
    "Title": "tmp-34722-xETce7Wa4bRR-.html",
    "Creator": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/136.0.0.0 Safari/537.36",
    "Producer": "Skia/PDF m136",
    "CreationDate": "D:20250519171424+00'00'",
    "ModDate": "D:20250519171424+00'00'",
    "Metadata": {}
  },
  "Pages": [
    {
      "Width": 38.25,
      "Height": 49.5,
      "HLines": [],
      "VLines": [],
      "Fills": [],
      "Texts": [],
      "Fields": [],
      "Boxsets": []
    }
  ]
}

Here is the simple PDF saved from chromium 136.

test.pdf

May 19 '25 17:05 robross0606

If it helps, I ran with verbosity set to high. On Windows where this still works, the logs are:

Warning: Setting up fake worker.       
Info: PDF loaded. pagesCount = 1
Info: start to parse page:1
Info: Skipped: tiny fill: 0 x 0
Info: Success: Page 1
Info: complete parsing page:1
Info: PDF parsing completed.

On Linux where this has stopped working, the logs for the same PDF file are:

Warning: Setting up fake worker.
Info: PDF loaded. pagesCount = 1
Info: start to parse page:1
Info: Skipped: tiny fill: 0 x 0
Info: Skipped: tiny fill: 0.441 x 0.567
Info: Skipped: tiny fill: 0.506 x 0.567
Info: Skipped: tiny fill: 0.414 x 0.547
Info: Skipped: tiny fill: 0.463 x 0.547
Info: Skipped: tiny fill: 0.352 x 0.547
Info: Skipped: tiny fill: 0.414 x 0.547
Info: Skipped: tiny fill: 0.463 x 0.547
Info: Success: Page 1
Info: complete parsing page:1
Info: PDF parsing completed.

It appears that, for some reason, content is being skipped.

May 19 '25 20:05 robross0606