PHPWord
PHPWord copied to clipboard
HTML generated from a Docx is way too big
Describe the Bug
I use PHPWord (and PHPSpreadsheet) to convert word/excel (and Openoffice equivalent) files into PDF (by converting them in HTML then using DomPDF). When my file has pictures in it, the rendered HTML size exploses : for instance a 1MB .docx file goes into a 36MB html string. (The PDF conversion then brings it back to 26MB, which is still way too much) After dumping the HTML, I would guess it's the base64 conversion of the pictures that makes everything go crazy.
Steps to Reproduce
Using that file : test-long.docx
<?php
require __DIR__ . '/vendor/autoload.php';
$path = 'PATH_TO_FILE';
$phpWord = \PhpOffice\PhpWord\IOFactory::load(file_get_contents($path), 'Word2007');
$htmlWriter = new \PhpOffice\PhpWord\Writer\HTML($phpWord);
$html = $htmlWriter->getContent();
echo strlen($html);
Expected Behavior
I would expect it to be more concise. I understand that the conversion may produce a bigger filer, but in that case it's more than 10x bigger.
Context
Please fill in your environment information:
- PHP 7.4.28 (cli) (built: Mar 3 2022 09:59:56) ( NTS ) Copyright (c) The PHP Group Zend Engine v3.4.0, Copyright (c) Zend Technologies
- PHPWord Version 0.18.2
- Server is a dockerized Ubuntu based on the php:7.4-fpm image.
Best regards,