PHPWord icon indicating copy to clipboard operation
PHPWord copied to clipboard

Feature request: Support EMF image

Open bakkan opened this issue 7 years ago • 19 comments

This is:

  • [√] a bug report
  • [√] a feature request

Expected Behavior

Support EMF image.

Failure Information

Throws PhpOffice\PhpWord\Exception\InvalidImageException exception. Exception message : Invalid image: zip:///Users/xxx/Downloads/xxxx.docx#word/media/image.emf #0 /works/shared/laravel/vendor/phpoffice/phpword/src/PhpWord/Element/Image.php(149): PhpOffice\PhpWord\Element\Image->checkImage() #1 [internal function]: PhpOffice\PhpWord\Element\Image->__construct('zip:///Users/hu...', NULL, false, 'Picture 18')

How to Reproduce

Document file contains emf format images. Google emf I got this page: https://fileinfo.com/extension/emf

<?php
use PhpOffice\PhpWord\IOFactory;
$file = '/path/to/file.docx';
$phpWord = IOFactory::load($file);
$sections = $phpWord->getSections();
foreach ($sections as $section) {
      $elements = $section->getElements();
      foreach ($elements as $element) {
            // do something else...
      }
}

Context

  • PHP version: PHP 7.1.16
  • PHPWord version: 0.15.0

bakkan avatar Sep 29 '18 08:09 bakkan

PHPWord uses getimagesize() function to get image info, getimagesize() doesn't support emf format. 😂😂

bakkan avatar Sep 29 '18 08:09 bakkan

I using phpword: dev-master and see error

[Mon, 08 Apr 2019 09:26:50 +0700] [127.0.0.1] [Error(1): Uncaught exception 'PhpOffice\PhpWord\Exception\InvalidImageException' with message 'Invalid image: zip:///opt/lampp/temp/php4WlqvI#word/media/image1.emf' in /media/hongoctrien/DATA/MyHost/NukeViet/module-nvonlinetest-01.vn/vendor/phpoffice/phpword/src/PhpWord/Element/Image.php:418
Stack trace:
#0 /media/hongoctrien/DATA/MyHost/NukeViet/module-nvonlinetest-01.vn/vendor/phpoffice/phpword/src/PhpWord/Element/Image.php(149): PhpOffice\PhpWord\Element\Image->checkImage()
#1 [internal function]: PhpOffice\PhpWord\Element\Image->__construct('zip:///opt/lamp...')
#2 /media/hongoctrien/DATA/MyHost/NukeViet/module-nvonlinetest-01.vn/vendor/phpoffice/phpword/src/PhpWord/Element/AbstractContainer.php(145): ReflectionClass->newInstanceArgs(Array)
#3 [internal function]: PhpOffice\PhpWord\Element\AbstractContainer->addElement('Image', 'zip:///opt/lamp...')
#4 /media/hongoctrien/DATA/MyHost/NukeViet/module-nvonlinetest-01.vn/vendor/phpoffice/phpword/src/PhpWord/Element/AbstractContainer.php(112): call_user_func_array(Array] [FILE: /vendor/phpoffice/phpword/src/PhpWord/Element/Image.php] [LINE: 418]

mynukeviet avatar Apr 08 '19 03:04 mynukeviet

Any news on this issue? Will this be addressed sooner or later?

derKroisi avatar Nov 24 '21 09:11 derKroisi

I encountered this error just now. I guess EMF format is becoming more commonly used in modern docx files

ThomazPom avatar May 07 '22 23:05 ThomazPom

The same problem for me today. Any news about this issue ?

RomMad avatar Oct 03 '22 14:10 RomMad

There isn't any support for .emf file but there is a workaround

  • Change extension of your .docx template to .zip
  • Unzip into a directory
  • Search for .eml file creating the issue in the xml files (I used VSCode Find in Folder)
  • Save the .eml file as .jpeg
  • Update the XML file containing .eml file to .jpeg
  • Compress content of the directory again
  • Change extension to .docx from .zip and try again.

gurpreetbhatoa avatar Oct 14 '22 07:10 gurpreetbhatoa

Workaround by code : PHPWord includes template processing for this.

include 'vendor/autoload.php';
$templateProcessor = new \PhpOffice\PhpWord\TemplateProcessor('test2.docx');
$templateProcessor->setValue('name', 'myvar');
$templateProcessor->saveAs('./xx.docx');

https://phpword.readthedocs.io/en/latest/templates-processing.html https://stackoverflow.com/a/53039632/4693790

You can avoid using TemplateProcessing as your need is only to replace .emf references

You may write a prepareDocxReplaceEMF($docxPath) function that do all of these actions on a docx file, before working with phpword renaming docx to zip is not needed .

Use PHP ZipArchive to extract "YOURDOC.docx\word_rels\document.xml.rels" https://www.php.net/manual/en/ziparchive.extractto.php

Replace EMF references in file https://stackoverflow.com/a/69155428/4693790

Use PHP ZipArchive to zip document.xml.rels back https://www.php.net/manual/en/ziparchive.addfile.php

Use PHP ZipArchive to extract emf file https://www.php.net/manual/en/ziparchive.extractto.php

Use ImageMagick to convert the EMF FILE https://imagemagick.org/script/formats.php https://www.php.net/manual/fr/book.imagick.php

Use PHP ZipArchive to zip jpeg file back https://www.php.net/manual/en/ziparchive.addfile.php

ThomazPom avatar Oct 14 '22 09:10 ThomazPom

Workaround that worked for me

    private function removeImageReferences($zip, $placeholderImagePath)
    {
        $relsPath = 'word/_rels/document.xml.rels';
        $relsContent = $zip->getFromName($relsPath);

        $relsXml = new SimpleXMLElement($relsContent);
        $imagePaths = [];

        foreach ($relsXml->Relationship as $relationship) {
            if (strpos($relationship['Type'], 'image') !== false) {
                // Store the original image path
                $imagePaths[] = 'word/' . $relationship['Target'];

                // Replace the image target with a placeholder image reference
                $placeholderImageTarget = 'media/placeholder.png';
                $relationship['Target'] = $placeholderImageTarget;
            }
        }

        // Update the relationships file
        $zip->deleteName($relsPath);
        $zip->addFromString($relsPath, $relsXml->asXML());

        // Delete the original image files
        foreach ($imagePaths as $imagePath) {
            $zip->deleteName($imagePath);
        }

        // Add the placeholder image to the zip archive
        $zip->addFile($placeholderImagePath, 'word/' . $placeholderImageTarget);
    }


    private function getPlaceholderImage()
    {
        $placeholderImagePath = 'placeholder.png';

        if (!Storage::disk('local')->exists($placeholderImagePath)) {
            $width = 1;
            $height = 1;
            $color = [255, 255, 255]; // RGB value for white color
            $image = imagecreatetruecolor($width, $height);
            $color = imagecolorallocate($image, $color[0], $color[1], $color[2]);
            imagefilledrectangle($image, 0, 0, $width - 1, $height - 1, $color);
            ob_start();
            imagepng($image);
            $imageData = ob_get_contents();
            ob_end_clean();
            Storage::disk('local')->put($placeholderImagePath, $imageData);
        }

        return storage_path('app/' . $placeholderImagePath);
    }

Then

            $tempFilePath = tempnam(sys_get_temp_dir(), 'doc');
            file_put_contents($tempFilePath, $response->getBody()->getContents());

            $zip = new ZipArchive();
            $placeholderImagePath = $this->getPlaceholderImage();

            $zip->open($tempFilePath);
            $this->removeImageReferences($zip, $placeholderImagePath);
            $zip->close();

             $phpWord = IOFactory::load($tempFilePath);

user3470 avatar Apr 19 '23 21:04 user3470

In the unlikely event that this is going to be fixed at anytime soon due to what seems to be poor support of EMF images with PHP, is it worth catching this error and replacing the image with a placeholder 'can't be found image/message'?

Then, at least the library can be used for any documents which use an EMF image.

websuasive avatar May 25 '23 12:05 websuasive

Emf Specifications:

https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-emf/91c257d7-c39d-4a36-9b1f-63e3f73d30ca?redirectedfrom=MSDN

thomasb88 avatar Sep 12 '23 10:09 thomasb88

So, PHP getimagesize and getimagesizefromstring accept the following formats https://www.php.net/manual/fr/image.constants.php

It is not including emf file (neither svg...).

So this could be a PHP Feature Request, but in the meantime, we could try to implement it "PHP like" on PHPWord.

In Php code: PHP_FUNCTION(getimagesize) { php_getimagesize_from_any(INTERNAL_FUNCTION_PARAM_PASSTHRU, FROM_PATH); } /* }}} */

/* {{{ Get the size of an image as 4-element array */ PHP_FUNCTION(getimagesizefromstring) { php_getimagesize_from_any(INTERNAL_FUNCTION_PARAM_PASSTHRU, FROM_DATA); }

It then get the stream, and call php_getimagesize_from_stream

To know which kind of file it is, it call then php_getimagesize_from_stream

For each kind of defined type, it check a specific number of bytes, and then the corresponding content.

For example, for jpeg, the 3 first bytes should be PHPAPI const char php_sig_jpg[3] = {(char) 0xff, (char) 0xd8, (char) 0xff};

Then it apply a image type specific function to get the related image size. For example, for PSD image type;

"static struct gfxinfo *php_handle_psd (php_stream * stream) { struct gfxinfo *result = NULL; unsigned char dim[8];

if (php_stream_seek(stream, 11, SEEK_CUR))
	return NULL;

if (php_stream_read(stream, (char*)dim, sizeof(dim)) != sizeof(dim))
	return NULL;

result = (struct gfxinfo *) ecalloc(1, sizeof(struct gfxinfo));
result->height   =  (((unsigned int)dim[0]) << 24) + (((unsigned int)dim[1]) << 16) + (((unsigned int)dim[2]) << 8) + ((unsigned int)dim[3]);
result->width    =  (((unsigned int)dim[4]) << 24) + (((unsigned int)dim[5]) << 16) + (((unsigned int)dim[6]) << 8) + ((unsigned int)dim[7]);

return result;

}"

Or for BMP file "static struct gfxinfo *php_handle_bmp (php_stream * stream) { struct gfxinfo *result = NULL; unsigned char dim[16]; int size;

if (php_stream_seek(stream, 11, SEEK_CUR))
	return NULL;

if (php_stream_read(stream, (char*)dim, sizeof(dim)) != sizeof(dim))
	return NULL;

size   = (((unsigned int)dim[ 3]) << 24) + (((unsigned int)dim[ 2]) << 16) + (((unsigned int)dim[ 1]) << 8) + ((unsigned int) dim[ 0]);
if (size == 12) {
	result = (struct gfxinfo *) ecalloc (1, sizeof(struct gfxinfo));
	result->width    =  (((unsigned int)dim[ 5]) << 8) + ((unsigned int) dim[ 4]);
	result->height   =  (((unsigned int)dim[ 7]) << 8) + ((unsigned int) dim[ 6]);
	result->bits     =  ((unsigned int)dim[11]);
} else if (size > 12 && (size <= 64 || size == 108 || size == 124)) {
	result = (struct gfxinfo *) ecalloc (1, sizeof(struct gfxinfo));
	result->width    =  (((unsigned int)dim[ 7]) << 24) + (((unsigned int)dim[ 6]) << 16) + (((unsigned int)dim[ 5]) << 8) + ((unsigned int) dim[ 4]);
	result->height   =  (((unsigned int)dim[11]) << 24) + (((unsigned int)dim[10]) << 16) + (((unsigned int)dim[ 9]) << 8) + ((unsigned int) dim[ 8]);
	result->height   =  abs((int32_t)result->height);
	result->bits     =  (((unsigned int)dim[15]) <<  8) +  ((unsigned int)dim[14]);
} else {
	return NULL;
}

return result;

}"

So, we could implement a glue, that can rely on the file name (.xxx) or on the first byte definition for EMF, and then retrieve the related content from the specification.

More precisely "1.3.1 Metafile Structure An EMF metafile begins with a EMR_HEADER record (section 2.3.4.2), which includes the metafile version, its size, the resolution of the device on which the picture was created, and it ends with an EMR_EOF record (section 2.3.4.1). Between them are records that specify the rendering of the image."

And then "2.3.4.2 EMR_HEADER Record Types The EMR_HEADER record is the starting point of an EMF metafile. It specifies properties of the device on which the image in the metafile was recorded; this information in the header record makes it possible for EMF metafiles to be independent of any specific output device. The following are the EMR_HEADER record types. Name Section Description EmfMetafileHeader 2.3.4.2.1 The original EMF header record. EmfMetafileHeaderExtension1 2.3.4.2.2 The header record defined in the first extension to EMF, which added support for OpenGL records and an optional internal pixel format descriptor.<62> EmfMetafileHeaderExtension2 2.3.4.2.3 The header record defined in the second extension to EMF, which added the capability of measuring display dimensions in micrometers.<63> EMF metafiles SHOULD be created with an EmfMetafileHeaderExtension2 header record. The generic structure of EMR_HEADER records is specified as follows. ... Type (4 bytes): An unsigned integer that identifies this record type as EMR_HEADER. This value is 0x00000001 ... The value of the Size field can be used to distinguish between the different EMR_HEADER record types listed earlier in this section. There are three possible headers:  The EmfMetafileHeader record. The fixed-size part of this header is 88 bytes, and it contains a Header object (section 2.2.9).  The EmfMetafileHeaderExtension1 record. The fixed-size part of this header is 100 bytes, and it contains a Header object and a HeaderExtension1 object (section 2.2.10).  The EmfMetafileHeaderExtension2 record. The fixed-size part of this header is 108 bytes, and it contains a Header object, a HeaderExtension1 object, and a HeaderExtension2 object (section 2.2.11)."

Then in 2.2.9 "Bounds (16 bytes): A RectL object ([MS-WMF] section 2.2.2.19) that specifies the rectangular inclusive-inclusive bounds in logical units of the smallest rectangle that can be drawn around the image stored in the metafile."

Which get us in https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-wmf/4813e7fd-52d0-4f42-965f-228c8b7488d2 section 2.2.2.19 "2.2.2.19 RectL Object The RectL Object defines a rectangle. ... Left (4 bytes): A 32-bit signed integer that defines the x coordinate, in logical coordinates, of the upper-left corner of the rectangle. Top (4 bytes): A 32-bit signed integer that defines the y coordinate, in logical coordinates, of the upper-left corner of the rectangle. Right (4 bytes): A 32-bit signed integer that defines the x coordinate, in logical coordinates, of the lower-right corner of the rectangle. Bottom (4 bytes): A 32-bit signed integer that defines y coordinate, in logical coordinates, of the lower-right corner of the rectangle. A rectangle defined with a RectL Object is filled up to— but not including—the right column and bottom row of pixels"

thomasb88 avatar Sep 22 '23 12:09 thomasb88

Hi Progi1984,

I hadn't the time to install the whole environment to be able to test looking to the project standards, but i wrote a glue for getimagesize that is working on my environment.

As the specification is a little bit painful, i copy below the function, hoping it could help you in managing this ticket.

"/** * Get image size from filename (glue over PHP that don't manage all image file types, like emf). * * First try to use PHP function getimagesize, then implement a custom glue for unsupported formats. * For unsupported formats, check also the filename extension. * * @param string $filename * * @return null|array */ private function getImageSizeGlue($filename, &$image_info = null) { $imageData = @getimagesize($filename, $image_info); if (!is_array($imageData)) { $image_path_parts = pathinfo($this->source); $source_extension = (array_key_exists('extension', $image_path_parts))?$image_path_parts['extension']:''; $hexaImageString = bin2hex($this->getImageString()); switch($source_extension){ case 'emf': // As Of EMF Specification, chapter 1.3.3, Data in metafile records is stored in little-endian format $tag_format = bin2hex(implode(array_reverse(str_split(hex2bin(substr($hexaImageString, 0, 8)))))); if('00000001' != $tag_format){ throw new InvalidImageException(sprintf('Invalid %s image format: Bad EMR_READER tag (%s instead of 00000001)', $source_extension, $tag_format)); } $existing_format_version = ['00000058' => 'EmfMetafileHeader', '00000064' => 'EmfMetafileHeaderExtension1', '0000006c' => 'EmfMetafileHeaderExtension2']; $format_version = bin2hex(implode(array_reverse(str_split(hex2bin(substr($hexaImageString, 8, 8)))))); if(!in_array($format_version, array_keys($existing_format_version))){ throw new InvalidImageException(sprintf('Invalid %s image format: Invalid Header Size (%s)', $source_extension, $format_version)); } $record_signature = bin2hex(implode(array_reverse(str_split(hex2bin(substr($hexaImageString, 80, 8)))))); if('464d4520' != $record_signature){ throw new InvalidImageException(sprintf('Invalid %s image format: Bad ENHMETA_SIGNATURE Record Signature (%s)', $source_extension, $record_signature)); } $emf_version = bin2hex(implode(array_reverse(str_split(hex2bin(substr($hexaImageString, 88, 8)))))); if('00010000' != $emf_version){ throw new InvalidImageException(sprintf('Invalid %s image format: Bad Version (%s)', $source_extension, $emf_version)); } $header_reserved = bin2hex(implode(array_reverse(str_split(hex2bin(substr($hexaImageString, 116, 4)))))); if('0000' != $header_reserved){ throw new InvalidImageException(sprintf('Invalid %s image format: Bad Reserved Tag (%s)', $source_extension, $header_reserved)); } if(hexdec($format_version) > 88){ $bOpenGL = bin2hex(implode(array_reverse(str_split(hex2bin(substr($hexaImageString, 116, 4)))))); if(!in_array($bOpenGL, ['00000000', '00000001'])){ throw new InvalidImageException(sprintf('Invalid %s image format: Bad OpenGL Tag (%s)', $source_extension, $bOpenGL)); } } // RectL Object Image Size in Pixels. As Of MS-WMF specification, A Rectangle defined with a RectL Object is filled up to - but not including - the right column and bottom row of pixel. $image_bounds_raw = substr($hexaImageString, 16, 32); $image_bounds = str_split($image_bounds_raw, 8); foreach($image_bounds as $bound_index => $bound_value){ $image_bounds[$bound_index] = bin2hex(implode(array_reverse(str_split(hex2bin($bound_value))))); } $height_in_pixels = abs(hexdec($image_bounds[0]) - hexdec($image_bounds[3])) + 1; $width_in_pixels = abs(hexdec($image_bounds[1]) - hexdec($image_bounds[2])) + 1; $image_type = self::IMAGETYPE_EMF; $size_string = sprintf('height="%s" width="%s"', $height_in_pixels, $width_in_pixels); $imageData = [$height_in_pixels, $width_in_pixels, $image_type, $size_string]; break; default: throw new InvalidImageException(sprintf('Unsupported image format: %s from file ', $source_extension, $this->source)); break; } } return $imageData; }"

thomasb88 avatar Sep 25 '23 15:09 thomasb88

But this only solve the CheckImage Problem.

There is also another problem on parseImage on PhpWord/Shared/Html.php on line 960

thomasb88 avatar Sep 26 '23 02:09 thomasb88

My Bad, the image type should also be modified

thomasb88 avatar Sep 26 '23 04:09 thomasb88

I got around this a year ago, this never bothered me again. I prepare any docx via the method 2 i enumerate here https://github.com/PHPOffice/PHPWord/issues/1480#issuecomment-1278708204

ThomazPom avatar Sep 26 '23 06:09 ThomazPom

Well, EMF to JPEG is not a lossless conversion.

That's why i updated PHPWord to manage emf image. But you're right that if you don't mind about image quality, your solution is a good workaround.

thomasb88 avatar Sep 26 '23 06:09 thomasb88

Someone has a file with EMF/WMF file, please ?

Progi1984 avatar Sep 26 '23 17:09 Progi1984

I have one, but it is my customer one, so it can't be used like that.

So i used the trial version of the Metafile Companion Software, and then produce a random image that i inserted on a random docx file. Docx with Emf Image for Test.docx

thomasb88 avatar Sep 27 '23 08:09 thomasb88

Hope it helps

thomasb88 avatar Sep 27 '23 08:09 thomasb88