PHPExcel icon indicating copy to clipboard operation
PHPExcel copied to clipboard

Read multiline cell returns escaped characters like _x000D_

Open andy128k opened this issue 7 years ago • 4 comments

When I read a cell which contains multiple lines of text, I receive string containing x000D.

I have created workaround, but believe this should be in phpexcel itself.

    function unescape($string)
    {
        return preg_replace_callback('/_x([0-9a-fA-F]{4})_/', function ($matches) {
            return iconv('UCS-2', 'UTF-8', hex2bin($matches[1]));
        }, $string);
    }

Link to corresponding spec: https://msdn.microsoft.com/en-us/library/ff534667(v=office.12).aspx

andy128k avatar Dec 29 '16 18:12 andy128k

Thank you for reporting the issue, however PHPExcel development for next version has moved under its new name PhpSpreadsheet. Would you please consider heading over to PhpSpreadsheet to contribute a patch that include unit tests ?

PowerKiKi avatar Dec 30 '16 02:12 PowerKiKi

@andy128k When using your workaround I experienced the following:

oddcharacters

mgilberties avatar Feb 02 '18 10:02 mgilberties

@mgilberties Maybe this is what actually in your document? What do you have without applying my function?

andy128k avatar Feb 04 '18 22:02 andy128k

Came across this today and I am using this to convert those escape sequences to html entities:

preg_replace('/_x([0-9a-fA-F]{4})_/', '&#x$1;', $string);

You can take this a step further to render the actual character:

html_entity_decode(preg_replace('/_x([0-9a-fA-F]{4})_/', '&#x$1;', $string));

And if you're like me, and you don't really want line feeds in your output you can simply replace them with spaces — since more likely than not these may be present in the midst of some xml inside a cell where whitespace is somewhat arbitrary:

preg_replace('/\s+/',' ', html_entity_decode(preg_replace('/_x([0-9a-fA-F]{4})_/', '&#x$1;', $string)));

billynoah avatar Dec 05 '18 05:12 billynoah