pdfparser
pdfparser copied to clipboard
Parser is skipping the first page
- PHP Version: 8.2.20
- PDFParser Version: 2.11.0
Description:
I'm parsing the PDF which can be found here: https://oag.ca.gov/system/files/Maxar%20-%20Adult%20CA%20Sample%20Ltr_Redacted.pdf
The parser appears to be skipping the first page, and only extracting text from the last two.
PDF input
See link above.
Expected output & actual output
I would expect the output to start with "MAXAR SPACE SYSTEMS", or perhaps "I write on behalf of." Instead, this is what I get:
"not been delayed due to any law enforcement investigation. We are also taking additional actions as required..." with interspersed tabs.
Code
I'm using the simplest possible code:
$parser = new Parser();
$pdf = $parser->parseFile($filePath);
$text = $pdf->getText();
The first page of the linked PDF is an image of text (and the QR code etc). pdfparser can't extract text from images.