msphpsql icon indicating copy to clipboard operation
msphpsql copied to clipboard

An error occurred translating string for a field to UTF-8 retrieving truncated unicode character

Open maxiwheat opened this issue 5 years ago • 2 comments

PHP Driver version or file name

5.6.1

SQL Server version

2014

Client operating system

Red Hat 7

PHP version

7.2.18

Microsoft ODBC Driver version

17

Table schema

Problem description

When I try to retrieve data stored in a column that contains a truncated 4 bytes UTF-8 (or USC-2) character, the driver generates an error: An error occurred translating string for a field to UTF-8: Error code 0x0 and the row cannot be fetched at all.

As I cannot always know in advance what the data would be in a column of a table, if it contains an invalid character, it prevents my code from getting the data.

Expected behavior and actual behavior

I expect the truncated string/chars to be return as is, replaced with a question mark or at least completely removed. This query runs okay, without any error in SSMS and the string appears truncated (the second character is replaced by a question mark like this �)

Repro code or steps to reproduce

    $connInfo = array(
        'Database' => 'MyDatabase',
        'UID' => 'MyUsername',
        'PWD' => 'MyPassword',
        'LoginTimeout' => 5,
        'CharacterSet' => 'UTF-8'
    );

    $conn = sqlsrv_connect('myserver.mydomain.com', $connInfo);
    
    // The christmas tree emoji is 4 bytes, if we store two of them in a NVARCHAR(3), which holds 6 bytes, the second christmas tree character is truncated
    $sql = "SET NOCOUNT ON;
            DECLARE @val NVARCHAR(3) = N'🎄🎄';
            CREATE TABLE #tmpTest (testCol NVARCHAR(3));
            INSERT INTO #tmpTest (testCol) VALUES (@val);
            SELECT * from #tmpTest;";

    $stmt = sqlsrv_query($conn, $sql);

    $row = sqlsrv_fetch_array($stmt, SQLSRV_FETCH_ASSOC);

    $errors = sqlsrv_errors(SQLSRV_ERR_ERRORS);
    // $errors contain "An error occurred translating string for a field to UTF-8: Error code 0x0"
    var_dump($row, $errors);

maxiwheat avatar May 16 '19 17:05 maxiwheat

@maxiwheat Thank you for reporting this. I am able to reproduce it; we will keep this thread updated.

david-puglielli avatar May 16 '19 20:05 david-puglielli

@maxiwheat this is by design. We have done some tests in our drivers -- insertion of invalid characters fails. The same happens with MySQL and PostgreSQL.

On the other hand, with fetching, we might consider the possibility of being more lenient. Since this changes the existing behavior, the default is the strict way but we might provide an option for ignoring errors such that fetching can continue, which means that we allow displaying bad data (unpredictable).

We will leave this issue open and see what the other users think.

yitam avatar Jun 03 '19 23:06 yitam