PHPWord icon indicating copy to clipboard operation
PHPWord copied to clipboard

HTML generated from a Docx is way too big

Open Mouke opened this issue 2 years ago • 0 comments

Describe the Bug

I use PHPWord (and PHPSpreadsheet) to convert word/excel (and Openoffice equivalent) files into PDF (by converting them in HTML then using DomPDF). When my file has pictures in it, the rendered HTML size exploses : for instance a 1MB .docx file goes into a 36MB html string. (The PDF conversion then brings it back to 26MB, which is still way too much) After dumping the HTML, I would guess it's the base64 conversion of the pictures that makes everything go crazy.

Steps to Reproduce

Using that file : test-long.docx

<?php
require __DIR__ . '/vendor/autoload.php';

$path = 'PATH_TO_FILE';

$phpWord = \PhpOffice\PhpWord\IOFactory::load(file_get_contents($path), 'Word2007');
$htmlWriter = new \PhpOffice\PhpWord\Writer\HTML($phpWord);
$html = $htmlWriter->getContent();
echo strlen($html);

Expected Behavior

I would expect it to be more concise. I understand that the conversion may produce a bigger filer, but in that case it's more than 10x bigger.

Context

Please fill in your environment information:

  • PHP 7.4.28 (cli) (built: Mar 3 2022 09:59:56) ( NTS ) Copyright (c) The PHP Group Zend Engine v3.4.0, Copyright (c) Zend Technologies
  • PHPWord Version 0.18.2
  • Server is a dockerized Ubuntu based on the php:7.4-fpm image.

Best regards,

Mouke avatar Mar 09 '22 15:03 Mouke