pdfparser
pdfparser copied to clipboard
Exception Missing catalog
I am getting an exception when trying to parse PDF
Exception
Missing catalog.
return $pages;
} elseif (isset($this->dictionary['Page'])) {
// Search for 'page' (unordered pages).
$pages = $this->getObjectsByType('Page');
return array_values($pages);
} else {
throw new \Exception('Missing catalog.');
}
}
Can you show the PDF file?
I have a similar problem.
Here you can find a broken pdf: https://github.com/jaberu/PhpOfficeUtils/blob/master/test/resources/missing-catalog.pdf
I have the same problem parsing a PDF. The exception is thrown in the line 246 inside the file src\Smalot\PdfParser\Document.php with the message Missing catalog.
I have the same trouble, when parsing with pdfparser. Fatal error: Uncaught exception 'Exception' with message 'Missing catalog.'
It's from document.php the last line: throw new \Exception('Missing catalog.'); (in public function getPages() )
So, does anybody know, how to make a workaround?
Same Problem
Hi
Me also but only withg one long and heavy PDF other even multiple pages are working ok
@renaudham it would be great to have some testfile which throws this exception,
Hi
here is one attached.
I have analyzed Document.php and it seems that the Dictionnary returns empty because there is no TYPE returned here
protected function buildDictionary() { // Build dictionary. $this->dictionary = array();
foreach ($this->objects as $id => $object) {
$type = $object->getHeader()->get('Type')->getContent();
**// var_dump($type);**
if (!empty($type)) {
$this->dictionary[$type][$id] = $id;
}
}
}
so for my usage I simply modified few lines
public function getPages() { ... return array(); throw new \Exception('Missing catalog.'); ...
public function getText(Page $page = null)
{
... if(count($pages)==0){ return false; }
like that when I call a doc with this issue (no types) I will receive a "false" that I can use to swicth to different unparsable no content treatment (as of course I will get zero content from this pdf)
thanks
the foreach ($this->objects as $id => $object) { $type = $object->getHeader() .... (cut the rest to get type and content)
with getHeader only return object elements
but it seems there is not "type" but also zero content extracted in the $this->objects
I added this as a question on Stackoverflow to see if we could get some help.
https://stackoverflow.com/questions/48173527/continue-a-script-after-an-exception-is-thrown-php
The answer for me was to make sure my Exception was namespaced in my try/catch block.
So this: catch (\Exception $e) , instead of catch (Exception $e).
It is already namespaced and not directly related to the pdfparser bug.
Same issue I am facing when trying to upload multiple pdfs, even in my script I have mentioned the namespace properly.
The namespace is not the problem but the parsing of the PDF.
Also please provide the full error/exception + stacktrace that you get.
Here you can see the stack trace. https://gist.github.com/aavrug/ee26ebc55f618b8bc93823df23470a51
i am having this same problem, is there any chance of a fix?
I have the same issue. I can tell that if I open the file (on a mac) in Preview.app and save the pdf out and try to parse it again, then it works fine.
So it is related to the tool that generates the PDF. I guess this depends on the used PDF version.
Can you check with Adobe Reader or some other tools which PDF version is used in both cases @djlift?
I just confirmed that I had a v1.6 and saved it down to to v1.3 and then I do not experience the issue.
So it is still an issue with the data structure of newer PDF versions.
I would imagine, yes. I don't really know much about the structures or differences of the different versions unfortunately.
same issue, what we need to fix this?
@smalot Are there news about the fix for this issue?
Thank you so much!
Same here, many thanks in advance for this great library.
Hello Everyone!
I was also facing the same problem of missing catalog fatal error.
I have tried try, throw and catch and now i am not getting any missing catalog fatal error. Below is the code where i have applied try, throw and catch:
` public function getPages()
{
try{
if (isset($this->dictionary['Catalog'])) {
// Search for catalog to list pages.
$id = reset($this->dictionary['Catalog']);
/** @var Pages $object */
$object = $this->objects[$id]->get('Pages');
if (method_exists($object, 'getPages')) {
$pages = $object->getPages(true);
return $pages;
}
}
if (isset($this->dictionary['Pages'])) {
// Search for pages to list kids.
$pages = array();
/** @var Pages[] $objects */
$objects = $this->getObjectsByType('Pages');
foreach ($objects as $object) {
$pages = array_merge($pages, $object->getPages(true));
}
return $pages;
}
if (isset($this->dictionary['Page'])) {
// Search for 'page' (unordered pages).
$pages = $this->getObjectsByType('Page');
return array_values($pages);
}
throw new \Exception('Missing catalog.');
}
catch(\Exception $e)
{
$pages = '0';
}
}`
Best of luck!!
@hnk15 can you provide these changes as patch file?
@DanielRuf what do you mean by patch file?
A patch file is a file use by patch tools to change files base on a diff.
See https://patch-diff.githubusercontent.com/raw/smalot/pdfparser/pull/224.patch
Yeah come on @hnk15 , I also have this bug, more of us could make great use of your patch.
I just checked, and the fix suggested suggested by @hnk15 was not added to the code yet, however I tested it and it resolved my issue. If you're not sure what to do but still have missing catalog fatal error issue, simply download the following file Font.php from your server. If you used composer to install the package, most likely it'll be located in your vendor directory, something like this: vendor/smalot/pdfparser/src/Smalot/PdfParser/Font.php
Open the file in your editor and replace the line 259, which is:
$part = pack('H*', $part);
with the following line of code:
$part = pack('H*', str_replace(' ', '', sprintf('%u', CRC32($part))));
Save the changes and re-upload the file to the same location.