pdfparser icon indicating copy to clipboard operation
pdfparser copied to clipboard

How to extract images from PDF?

Open philipnjuguna66 opened this issue 6 years ago • 14 comments

How do i extract images from pdf also if the table contains a table extract the same with styles

philipnjuguna66 avatar Dec 07 '18 13:12 philipnjuguna66

$parser = new \Smalot\PdfParser\Parser(); 
$pdf = $parser->parseFile('/your/pdf/file');

$images = $pdf->getObjectsByType('XObject', 'Image');

foreach( $images as $image ) {
    echo '<img src="data:image/jpg;base64,'. base64_encode($image->getContent()) .'" />';
}

BackendDevops avatar Dec 26 '18 20:12 BackendDevops

@philipnjuguna66 is your question answered?

rubenvanerk avatar Jun 04 '20 19:06 rubenvanerk

image is FlateDecode?? no save

BochinDiaz28 avatar Sep 07 '21 17:09 BochinDiaz28

image is FlateDecode?? no save

What do you mean?

is your question answered?

I agree. If there is no feedback soon, I will close for now.

k00ni avatar Sep 08 '21 08:09 k00ni

require '../composer/vendor/autoload.php'; use Smalot\PdfParser\Parser; use Smalot\PdfParser\XObject\Image; $parser = new Parser(); $pdf = $parser->parseFile('mipdf.pdf');

$i = 0; $xobjects = $pdf->getObjectsByType('XObject'); foreach ($xobjects as $xobject) { if ($xobject instanceof Image) { $content = $xobject->getContent(); if ('FlateDecode' === $xobject->getHeader()->getElements()['Filter' ]->getContent()) { $content = zlib_decode($content,); //no save here //ERROR CODE: HERE!! PLEASE NO DECODE IMAGEN! } file_put_contents("extraidas/". ++$i .".png", $content); } }

BochinDiaz28 avatar Sep 09 '21 20:09 BochinDiaz28

$content = zlib_decode($content,); //no save here //ERROR CODE: HERE!! PLEASE NO DECODE IMAGEN!

I still don't understand what exactly the problem is. What do you mean with save and decode? Please describe it a bit and don't just paste some unformatted code, not helpful.

k00ni avatar Sep 10 '21 08:09 k00ni

Okay! when I extract the images from a pdf, there are several that are not detected. Because they are inflated or encrypted, according to what I read I must pass them through a function like zlib_decode but this one, nor the other similar ones in php, returns the image to me, I always get an error code, I

uploaded the same pdf to some extraction pages and they return fine the image I need. Sorry not if I explain it well. I leave a reference link: https://stackoverflow.com/questions/59374914/how-to-extract-images-from-pdf-using-php

BochinDiaz28 avatar Sep 10 '21 13:09 BochinDiaz28

Okay! when I extract the images from a pdf, there are several that are not detected. Because they are inflated or encrypted, according to what I read I must pass them through a function like zlib_decode but this one, nor the other similar ones in php, returns the image to me, I always get an error code, I uploaded the same pdf to some extraction pages and they return fine the image I need. Sorry not if I explain it well. I leave a reference link: https://stackoverflow.com/questions/59374914/how-to-extract-images-from-pdf-using-php

Reference link is not working, plz provide some relatable resource for the same for better understanding

skverma618 avatar Sep 01 '22 14:09 skverma618

Undefined type 'Image'

above error is being shown for your code

skverma618 avatar Sep 01 '22 14:09 skverma618

$content = zlib_decode($content,); //no save here //ERROR CODE: HERE!! PLEASE NO DECODE IMAGEN!

I still don't understand what exactly the problem is. What do you mean with save and decode? Please describe it a bit and don't just paste some unformatted code, not helpful.

The problem is, When I try to extract images that are PNG formatted, they can't be extracted. PNG images that are FlateDecoded become corrupted and unreadable.

sbhshoaib avatar Aug 27 '23 15:08 sbhshoaib