pdfparser icon indicating copy to clipboard operation
pdfparser copied to clipboard

Exception Missing catalog

Open geraldmwanyika opened this issue 11 years ago • 49 comments

I am getting an exception when trying to parse PDF

Exception

Missing catalog.

 return $pages;
} elseif (isset($this->dictionary['Page'])) {
// Search for 'page' (unordered pages).
$pages = $this->getObjectsByType('Page');
return array_values($pages);
} else {
throw new \Exception('Missing catalog.');
}
}

geraldmwanyika avatar Apr 11 '14 11:04 geraldmwanyika

Can you show the PDF file?

DanielRuf avatar Apr 12 '14 10:04 DanielRuf

I have a similar problem.

Here you can find a broken pdf: https://github.com/jaberu/PhpOfficeUtils/blob/master/test/resources/missing-catalog.pdf

jaberu avatar Jul 30 '16 17:07 jaberu

I have the same problem parsing a PDF. The exception is thrown in the line 246 inside the file src\Smalot\PdfParser\Document.php with the message Missing catalog.

SiroDiaz avatar Jan 24 '17 20:01 SiroDiaz

I have the same trouble, when parsing with pdfparser. Fatal error: Uncaught exception 'Exception' with message 'Missing catalog.'

It's from document.php the last line: throw new \Exception('Missing catalog.'); (in public function getPages() )

So, does anybody know, how to make a workaround?

AndyABX avatar Jan 31 '17 19:01 AndyABX

Same Problem

drexlma avatar Apr 13 '17 08:04 drexlma

Hi

Me also but only withg one long and heavy PDF other even multiple pages are working ok

renaudham avatar Jun 22 '17 14:06 renaudham

@renaudham it would be great to have some testfile which throws this exception,

DanielRuf avatar Jun 22 '17 14:06 DanielRuf

Hi

here is one attached.

I have analyzed Document.php and it seems that the Dictionnary returns empty because there is no TYPE returned here

protected function buildDictionary() { // Build dictionary. $this->dictionary = array();

    foreach ($this->objects as $id => $object) {

test.pdf

        $type = $object->getHeader()->get('Type')->getContent();

       **// var_dump($type);**
        if (!empty($type)) {
            $this->dictionary[$type][$id] = $id;
        }
    }
}

so for my usage I simply modified few lines

public function getPages() { ... return array(); throw new \Exception('Missing catalog.'); ...

public function getText(Page $page = null)
{

... if(count($pages)==0){ return false; }

like that when I call a doc with this issue (no types) I will receive a "false" that I can use to swicth to different unparsable no content treatment (as of course I will get zero content from this pdf)

thanks

renaudham avatar Jun 22 '17 15:06 renaudham

the foreach ($this->objects as $id => $object) { $type = $object->getHeader() .... (cut the rest to get type and content)

with getHeader only return object elements

but it seems there is not "type" but also zero content extracted in the $this->objects

renaudham avatar Jun 22 '17 15:06 renaudham

I added this as a question on Stackoverflow to see if we could get some help.

https://stackoverflow.com/questions/48173527/continue-a-script-after-an-exception-is-thrown-php

The answer for me was to make sure my Exception was namespaced in my try/catch block.

So this: catch (\Exception $e) , instead of catch (Exception $e).

tim-peterson avatar Jan 09 '18 17:01 tim-peterson

It is already namespaced and not directly related to the pdfparser bug.

DanielRuf avatar Jan 09 '18 17:01 DanielRuf

Same issue I am facing when trying to upload multiple pdfs, even in my script I have mentioned the namespace properly.

aavrug avatar Jan 18 '18 06:01 aavrug

The namespace is not the problem but the parsing of the PDF.

DanielRuf avatar Jan 18 '18 06:01 DanielRuf

Also please provide the full error/exception + stacktrace that you get.

DanielRuf avatar Jan 18 '18 06:01 DanielRuf

Here you can see the stack trace. https://gist.github.com/aavrug/ee26ebc55f618b8bc93823df23470a51

aavrug avatar Jan 18 '18 06:01 aavrug

i am having this same problem, is there any chance of a fix?

barrychapman avatar Jan 25 '18 15:01 barrychapman

I have the same issue. I can tell that if I open the file (on a mac) in Preview.app and save the pdf out and try to parse it again, then it works fine.

djlift avatar Feb 22 '18 15:02 djlift

So it is related to the tool that generates the PDF. I guess this depends on the used PDF version.

Can you check with Adobe Reader or some other tools which PDF version is used in both cases @djlift?

DanielRuf avatar Feb 22 '18 15:02 DanielRuf

I just confirmed that I had a v1.6 and saved it down to to v1.3 and then I do not experience the issue.

djlift avatar Feb 23 '18 20:02 djlift

So it is still an issue with the data structure of newer PDF versions.

DanielRuf avatar Feb 23 '18 23:02 DanielRuf

I would imagine, yes. I don't really know much about the structures or differences of the different versions unfortunately.

djlift avatar Feb 24 '18 00:02 djlift

same issue, what we need to fix this?

qlstorm avatar Apr 03 '18 16:04 qlstorm

@smalot Are there news about the fix for this issue?

Thank you so much!

alejandr0 avatar Jul 27 '18 11:07 alejandr0

Same here, many thanks in advance for this great library.

Pesche007 avatar Nov 23 '18 10:11 Pesche007

Hello Everyone!

I was also facing the same problem of missing catalog fatal error.

I have tried try, throw and catch and now i am not getting any missing catalog fatal error. Below is the code where i have applied try, throw and catch:

         ` public function getPages()
          {
     try{	
          if (isset($this->dictionary['Catalog'])) {
          // Search for catalog to list pages.
          $id = reset($this->dictionary['Catalog']);

        /** @var Pages $object */
        $object = $this->objects[$id]->get('Pages');
        if (method_exists($object, 'getPages')) {
            $pages = $object->getPages(true);
            return $pages;
        }
    }

    if (isset($this->dictionary['Pages'])) {
        // Search for pages to list kids.
        $pages = array();

        /** @var Pages[] $objects */
        $objects = $this->getObjectsByType('Pages');
        foreach ($objects as $object) {
            $pages = array_merge($pages, $object->getPages(true));
        }

        return $pages;
    }

    if (isset($this->dictionary['Page'])) {
        // Search for 'page' (unordered pages).
        $pages = $this->getObjectsByType('Page');

        return array_values($pages);
    }

    throw new \Exception('Missing catalog.');
}
catch(\Exception $e)
{
	$pages = '0';
}
}`

Best of luck!!

hnk15 avatar Jan 19 '19 05:01 hnk15

@hnk15 can you provide these changes as patch file?

DanielRuf avatar Jan 19 '19 08:01 DanielRuf

@DanielRuf what do you mean by patch file?

hnk15 avatar Jan 19 '19 09:01 hnk15

A patch file is a file use by patch tools to change files base on a diff.

See https://patch-diff.githubusercontent.com/raw/smalot/pdfparser/pull/224.patch

DanielRuf avatar Jan 19 '19 09:01 DanielRuf

Yeah come on @hnk15 , I also have this bug, more of us could make great use of your patch.

vnbenny avatar Mar 13 '19 13:03 vnbenny

I just checked, and the fix suggested suggested by @hnk15 was not added to the code yet, however I tested it and it resolved my issue. If you're not sure what to do but still have missing catalog fatal error issue, simply download the following file Font.php from your server. If you used composer to install the package, most likely it'll be located in your vendor directory, something like this: vendor/smalot/pdfparser/src/Smalot/PdfParser/Font.php

Open the file in your editor and replace the line 259, which is: $part = pack('H*', $part); with the following line of code: $part = pack('H*', str_replace(' ', '', sprintf('%u', CRC32($part))));

Save the changes and re-upload the file to the same location.

usabilitest avatar May 26 '19 22:05 usabilitest