Ratchet icon indicating copy to clipboard operation
Ratchet copied to clipboard

Special characters wrong encoding

Open nipec94 opened this issue 9 years ago • 12 comments

I have a problem with the encoding of special characters. If I send e.g. ß from the client to the server and print it to the console, I only see: Ò 

I already tried to replace all Special chars to HTML entities and on server side entities to special chars. But that's not working either.

nipec94 avatar Feb 20 '17 21:02 nipec94

All special chars I print to the console from the server-script have wrong encodings! So it's definitely a PHP/Server-Problem.

nipec94 avatar Feb 20 '17 21:02 nipec94

This likely doesn't have to do with anything HTML but character encoding. The WebSocket specification states everything has to be UTF-8 so special characters shouldn't be a problem.

Make sure the ini setting mbstring.func_overload is set to 0.

Also check your default_charset and any iconv settings.

cboden avatar Feb 21 '17 13:02 cboden

I checked those config options. mbstring.func_overload is set to 0, default_charset, input_encoding, output_encoding and internal_encoding is set to utf8.

nipec94 avatar Feb 21 '17 20:02 nipec94

When I send special characters from the Server to the Client, everything is working fine:

server class:

<?php
class Server
{
    public function onOpen(ConnectionInterface $conn)
    {
        $conn->send('äüöß');
    }
}

client:

var ws = new WebSocket('...');
ws.onmessage = function (e) {
    console.log(e.data);
}

In the console I can see "äüöß".

But when I send special characters from the client to the server and use those strings for database statements, it won't work because special chars are converted to something cryptic.

E.g.: Database-Table: word

id | word

1 | Ränge

server:

<?php
class Server
{
    public function onMessage(ConnectionInterface $from, $msg)
    {
        $repo = $this->container->get('doctrine')->getEntityManager()->getRepository('AppBundle:Word');        
        if ($repo->findOneByWord($msg) == null) {
            echo $msg;
            $from->send('Word does not exist!');
        }
    }
}

When I send "Ränge" from the client to the server, I get "Word does not exist!" as response and in the console I see "R�nge".

I already set all php.ini *-encoding options to UTF-8. The problem occurs on localhost but also on the productive sever.

I'm using windows and cmd.exe to test the server locally. I already set the default code page of cmd.exe to UTF-8 with chcp 65001 but without success.

nipec94 avatar Feb 22 '17 18:02 nipec94

What is the character set of the HTML page? Are you setting a value for the meta charset? <meta charset="utf-8">

mikealmond avatar Mar 30 '17 03:03 mikealmond

I'm having a similar issue. All PHP configuration related to character encoding is set to UTF-8. I placed the following lines at the top of the PHP file, just for good measure:

ini_set('default_charset', 'utf-8');
setlocale(LC_CTYPE, 'en_GB.UTF-8');
mb_internal_encoding("UTF-8");

I have <meta charset="utf-8" /> in my HTML's <head>.

I have attached two screenshots, the first is that of the JS Console log, and the second is the command-line output from the PHP file.

I created a wrapper around the WebSocket API, all it does is convert various data to JSON and send it through the socket. You can see the JSON being printed to the JS console in the the screenshot below for demonstration. All looks good here.

2017-04-12 10_52_39-project

However, when the message is received by the WebSocket server, and printed using var_dump (or any other function that outputs to the buffer), the message is malformed.

2017-04-12 10_52_25-cmd - php websocket-server-new php

Now, please observe in the screenshot of the command-line output the fact that when the fourth message is received, and subsequently var_dump'ed, the part of the output that is prepended (the part that says string(59)) by the var_dump function, is also malformed. Perhaps this is just a result of the malformed text being printed, but I thought that it was odd nonetheless.

Could this be related?

CaelanStewart avatar Apr 12 '17 10:04 CaelanStewart

It's likely your terminal or shell environment is not configured to UTF-8.

cboden avatar Apr 22 '17 16:04 cboden

@cboden, unfortunatelu, that is not the case.

I actually created a Stack Overflow post about the problem.

I originally used a different WebSocket library known has Hoa, though that had some other issues which meant I couldn't use it. Though, when Hoa was being used, I had no issues with malformed characters.

Also, I tried buffering the output of the script, and then writing it to a file and opening it with a UTF-8-compatible text editor, and exactly the same issue with malformed characters was occurring.

CaelanStewart avatar Apr 24 '17 15:04 CaelanStewart

Can you use Wireshark and screen shot what data looks like through that?

Also, can you hard-code a UTF-8 character in your server (like ✓ and 😀) to send to clients. What does that look like in Wireshark and Chrome?

cboden avatar Aug 03 '18 15:08 cboden

@cboden

Hi there, I don't have the source code where this bug emerged any longer - nor the same local dev environment. Nor can I remember anything about the setup at the time.

Apologies for that, hopefully this was just some environment-specific occurrence limited to my setup.

CaelanStewart avatar Aug 06 '18 08:08 CaelanStewart

I'm having an opposite issue. Special characters are OK in a telnet terminal connecting to the server. But a I also put a mysql statement in the server script to log the client->server messages and the output in sql is garbage. The database table is of course set to utf8_general_ci and other PHP applications on the same server write to the same database correctly.

TechOverflow avatar Nov 21 '18 14:11 TechOverflow

Examine this https://jqcode.space/questions/special-characters-are-included-in-the-word-count explanation

jarielabbreviated avatar Jan 30 '25 00:01 jarielabbreviated

Hey! Thanks for reporting this issue 🙏

We're giving Ratchet some much-needed love and attention! As part of our issue cleanup initiative (#1100), we're reviewing and closing long-standing issues to prioritize what our community currently needs most.

The character encoding issues you're experiencing are typically related to inconsistent encoding handling across different layers of your application stack. Without knowing the specific source of your data and the exact encoding chain, it's difficult to pinpoint the root cause. Modern best practice is to use UTF-8 consistently throughout the entire application stack.

If this is still relevant to you, we'd love to continue this conversation in our GitHub Discussions. Just reference this issue when you post. If you want to support Ratchet, please consider sponsoring our work! ❤️

PaulRotmann avatar Jun 26 '25 07:06 PaulRotmann