UTF-8 Support
I am testing the signature -> set_metadata_props feature , but it's not show correctly. My signing reason is "ทดสอบ"
Is it support utf-8 encoding? Thank you.
Try #79
Hello,I've try
Try #79
It's truncate some character such as "ภาษาไทย" will return "ภา".
So I've change from :
return "\xFE\xFF" . mb_convert_encoding($string, 'UTF-16BE', $encoding);
TO :
return "\xEF\xBB\xBF".mb_convert_encoding($string, 'UTF-8', $encoding);
It's show correctly.
Thank you.
So I've change from :
return "\xFE\xFF" . mb_convert_encoding($string, 'UTF-16BE', $encoding);TO :return "\xEF\xBB\xBF".mb_convert_encoding($string, 'UTF-8', $encoding);
with that change I get this
What about using a custom encoded string when setting the metadata?
What about using a custom encoded string when setting the metadata?
That would work, but there would be the problem that every time someone doesn't know that they should do their own encoding, they will have problems and open a new issue.
@dealfonso One question, if the file says ANSI in the encoding, and the reason is in UTF-8 or another encoding, wouldn't this problem occur?
Look, I sent UTF-8 and it doesn't work
/Reason(ภาษาไทย)/Location(sdfs ó à Ã)>>
But I did send ISO-8859-1
/Reason(ó í í {} ` ~)/Location(sdfs ó í í)>>
Honestly, I have not considered this topic before. A quick search on google [1] tells me that PDF seems not to consider character encoding in a general form. It considers that the encoding depends on the font, and depending on the font, the same character will show a representation or another.
I don't know how this applies to the reason and so on.
That is why my "quick answer" is that pdf does not support utf-8 and so the users needs to encode the characters depending on their needs.
I'll read more about character encoding in the metadata. Do you have any source of info to read?
https://www.gnostice.com/nl_article.asp?id=383&t=Font_and_Encoding_Standard_types_supported_in_PDF_for_the_representation_of_text_content
It considers that the encoding depends on the font, and depending on the font, the same character will show a representation or another
But on text contents, metadata don't use fonts
I did try FPDF, and it works with UTF-8,
/Keywords (þÿ Ì + ^ ì ò Ò ê)
But here doesn't work https://github.com/Setasign/FPDF/blob/0838e0ee4925716fcbbc50ad9e1799b5edfae0a0/fpdf.php#L1169C1-L1189C2
I try to sign with TCPDF, It work with UTF-8 too. When open in VS-Code :
Sign with sapp, seem store as plain text :
I try [#79 ] by encode metadata to UTF-16BE with BOM, everything is OK.
Problem is, when I set string contain "\x0E\x28 " (ศ) or "\x0E\x29" (ษ) metadata is broken. I think that when compile to PDF, there will be "(" or ")" in compiled character, causing the incorrect display.
For example, I set my string to "ภาษาไทย".
Will show like this:
I do dirty fixed by adding "(" or ")" at the beginning or end of string.
So I add "(" to beginning of "ภาษาไทย" like this -> "(ภาษาไทย".
Then my signature show like this:
It display the text correctly but still has "(" in front of it.
Is there a correct way to deal with this problem?
Thank you.
Feel free to make a PR with the fix 👍
Could you please check #84
Thank you.
merged