Special characters wrong encoding
I have a problem with the encoding of special characters. If I send e.g. ß from the client to the server and print it to the console, I only see: Ò
I already tried to replace all Special chars to HTML entities and on server side entities to special chars. But that's not working either.
All special chars I print to the console from the server-script have wrong encodings! So it's definitely a PHP/Server-Problem.
This likely doesn't have to do with anything HTML but character encoding. The WebSocket specification states everything has to be UTF-8 so special characters shouldn't be a problem.
Make sure the ini setting mbstring.func_overload is set to 0.
Also check your default_charset and any iconv settings.
I checked those config options. mbstring.func_overload is set to 0, default_charset, input_encoding, output_encoding and internal_encoding is set to utf8.
When I send special characters from the Server to the Client, everything is working fine:
server class:
<?php
class Server
{
public function onOpen(ConnectionInterface $conn)
{
$conn->send('äüöß');
}
}
client:
var ws = new WebSocket('...');
ws.onmessage = function (e) {
console.log(e.data);
}
In the console I can see "äüöß".
But when I send special characters from the client to the server and use those strings for database statements, it won't work because special chars are converted to something cryptic.
E.g.: Database-Table: word
id | word
1 | Ränge
server:
<?php
class Server
{
public function onMessage(ConnectionInterface $from, $msg)
{
$repo = $this->container->get('doctrine')->getEntityManager()->getRepository('AppBundle:Word');
if ($repo->findOneByWord($msg) == null) {
echo $msg;
$from->send('Word does not exist!');
}
}
}
When I send "Ränge" from the client to the server, I get "Word does not exist!" as response and in the console I see "R�nge".
I already set all php.ini *-encoding options to UTF-8. The problem occurs on localhost but also on the productive sever.
I'm using windows and cmd.exe to test the server locally. I already set the default code page of cmd.exe to UTF-8 with chcp 65001 but without success.
What is the character set of the HTML page? Are you setting a value for the meta charset?
<meta charset="utf-8">
I'm having a similar issue. All PHP configuration related to character encoding is set to UTF-8. I placed the following lines at the top of the PHP file, just for good measure:
ini_set('default_charset', 'utf-8');
setlocale(LC_CTYPE, 'en_GB.UTF-8');
mb_internal_encoding("UTF-8");
I have <meta charset="utf-8" /> in my HTML's <head>.
I have attached two screenshots, the first is that of the JS Console log, and the second is the command-line output from the PHP file.
I created a wrapper around the WebSocket API, all it does is convert various data to JSON and send it through the socket. You can see the JSON being printed to the JS console in the the screenshot below for demonstration. All looks good here.

However, when the message is received by the WebSocket server, and printed using var_dump (or any other function that outputs to the buffer), the message is malformed.

Now, please observe in the screenshot of the command-line output the fact that when the fourth message is received, and subsequently var_dump'ed, the part of the output that is prepended (the part that says string(59)) by the var_dump function, is also malformed. Perhaps this is just a result of the malformed text being printed, but I thought that it was odd nonetheless.
Could this be related?
It's likely your terminal or shell environment is not configured to UTF-8.
@cboden, unfortunatelu, that is not the case.
I actually created a Stack Overflow post about the problem.
I originally used a different WebSocket library known has Hoa, though that had some other issues which meant I couldn't use it. Though, when Hoa was being used, I had no issues with malformed characters.
Also, I tried buffering the output of the script, and then writing it to a file and opening it with a UTF-8-compatible text editor, and exactly the same issue with malformed characters was occurring.
Can you use Wireshark and screen shot what data looks like through that?
Also, can you hard-code a UTF-8 character in your server (like ✓ and 😀) to send to clients. What does that look like in Wireshark and Chrome?
@cboden
Hi there, I don't have the source code where this bug emerged any longer - nor the same local dev environment. Nor can I remember anything about the setup at the time.
Apologies for that, hopefully this was just some environment-specific occurrence limited to my setup.
I'm having an opposite issue. Special characters are OK in a telnet terminal connecting to the server. But a I also put a mysql statement in the server script to log the client->server messages and the output in sql is garbage. The database table is of course set to utf8_general_ci and other PHP applications on the same server write to the same database correctly.
Examine this https://jqcode.space/questions/special-characters-are-included-in-the-word-count explanation
Hey! Thanks for reporting this issue 🙏
We're giving Ratchet some much-needed love and attention! As part of our issue cleanup initiative (#1100), we're reviewing and closing long-standing issues to prioritize what our community currently needs most.
The character encoding issues you're experiencing are typically related to inconsistent encoding handling across different layers of your application stack. Without knowing the specific source of your data and the exact encoding chain, it's difficult to pinpoint the root cause. Modern best practice is to use UTF-8 consistently throughout the entire application stack.
If this is still relevant to you, we'd love to continue this conversation in our GitHub Discussions. Just reference this issue when you post. If you want to support Ratchet, please consider sponsoring our work! ❤️