PHPExcel
PHPExcel copied to clipboard
Read multiline cell returns escaped characters like _x000D_
When I read a cell which contains multiple lines of text, I receive string containing x000D.
I have created workaround, but believe this should be in phpexcel itself.
function unescape($string)
{
return preg_replace_callback('/_x([0-9a-fA-F]{4})_/', function ($matches) {
return iconv('UCS-2', 'UTF-8', hex2bin($matches[1]));
}, $string);
}
Link to corresponding spec: https://msdn.microsoft.com/en-us/library/ff534667(v=office.12).aspx
Thank you for reporting the issue, however PHPExcel development for next version has moved under its new name PhpSpreadsheet. Would you please consider heading over to PhpSpreadsheet to contribute a patch that include unit tests ?
@andy128k When using your workaround I experienced the following:
@mgilberties Maybe this is what actually in your document? What do you have without applying my function?
Came across this today and I am using this to convert those escape sequences to html entities:
preg_replace('/_x([0-9a-fA-F]{4})_/', '&#x$1;', $string);
You can take this a step further to render the actual character:
html_entity_decode(preg_replace('/_x([0-9a-fA-F]{4})_/', '&#x$1;', $string));
And if you're like me, and you don't really want line feeds in your output you can simply replace them with spaces — since more likely than not these may be present in the midst of some xml inside a cell where whitespace is somewhat arbitrary:
preg_replace('/\s+/',' ', html_entity_decode(preg_replace('/_x([0-9a-fA-F]{4})_/', '&#x$1;', $string)));