pdf-to-html
pdf-to-html copied to clipboard
How to handle special chars?
My code to convert pdf into html file is:
\Gufy\PdfToHtml\Config::set('pdftohtml.bin', '/usr/local/bin/pdftohtml');
\Gufy\PdfToHtml\Config::set('pdfinfo.bin', '/usr/local/bin/pdfinfo');
$pdf = new Pdf('MY_DOCUMENT_PATH.pdf');
$page = $pdf->html();
I tried to use $pdf->html() and $pdf->getDom(), I get the same error.
Everything is working fine but now in the pdf document are some special chars and I'm getting following errors message:
DOMDocument::loadHTML(): Invalid char in CDATA 0x1 in Entity, line: ...
I tried with $pdf->html() and $pdf->getDom(), I get the same error.
With libxml_use_internal_errors(true) I get no errors but after conversion there is double content.
How is it possible to avoid this error message or to remove special chars...?