pdfparser icon indicating copy to clipboard operation
pdfparser copied to clipboard

Parser is skipping the first page

Open datawench opened this issue 1 year ago • 1 comments

  • PHP Version: 8.2.20
  • PDFParser Version: 2.11.0

Description:

I'm parsing the PDF which can be found here: https://oag.ca.gov/system/files/Maxar%20-%20Adult%20CA%20Sample%20Ltr_Redacted.pdf

The parser appears to be skipping the first page, and only extracting text from the last two.

PDF input

See link above.

Expected output & actual output

I would expect the output to start with "MAXAR SPACE SYSTEMS", or perhaps "I write on behalf of." Instead, this is what I get:

"not been delayed due to any law enforcement investigation. We are also taking additional actions as required..." with interspersed tabs.

Code

I'm using the simplest possible code:

$parser = new Parser();
$pdf = $parser->parseFile($filePath);
$text = $pdf->getText();

datawench avatar Nov 22 '24 15:11 datawench

The first page of the linked PDF is an image of text (and the QR code etc). pdfparser can't extract text from images.

rupertj avatar Jul 16 '25 08:07 rupertj