sscanf icon indicating copy to clipboard operation
sscanf copied to clipboard

Unicode symbols?

Open MuthaX opened this issue 3 years ago • 2 comments

Does this plugin supports strings where every cell is separate Unicode symbol (UTF-32 coded)?

MuthaX avatar Mar 27 '22 14:03 MuthaX

I honestly don't know. That should be tested.

Y-Less avatar Mar 28 '22 22:03 Y-Less

I honestly don't know. That should be tested.

So... I'm tested this and got negative result. I'm used my script for generate UTF-8 and Unicode-strings (https://github.com/MuthaX/PawnUTF)

// The source-code file in UTF-8.
#include <a_samp>
#include <sscanf2.cpp>
#include <PawnUtfConverter>
stock PrintChars(const header_msg[], const array[]) {
	printf("printing: %s", header_msg);
	new len = strlen(array);
	for(new i = 0; i < len; ++i) {
		printf("%d|%x (%d)", i, array[i], array[i]);
	}
}
main() {
	new
		utf8_stream[] = "some смешанный 123 теxt",
		utf8_string[sizeof(utf8_stream)],
		unicode_string[sizeof(utf8_string)],
		uword_1[16],
		uword_2[16],
		u_number,
		uword_3[16]
	;
	//	Because of utf8_stream' content (at Cyrillic symbols) weirdly compiled as "1_byte = 1_cell"(where non-ASCII(<128) is 2-byte long) ...
	//	... we merge bytes from separate cells into symbols in terms of UTF-8 coding to "1_symbol = 1_cell".
	PawnUTF_StreamToUTF(utf8_stream, sizeof(utf8_stream), utf8_string, sizeof(utf8_string), false);
	PrintChars("utf8_string", utf8_string);
	//	Now we translate UTF-8 string into Unicode-string (which equivalent is UTF-32).
	PawnUTF_StringUTF_ToUnicode(utf8_string, sizeof(utf8_string), unicode_string, sizeof(unicode_string));
	PrintChars("unicode_string", unicode_string);
	sscanf(unicode_string, "p< >s[16]s[16]ds[16]", uword_1, uword_2, u_number, uword_3);
	PrintChars("uword_1", uword_1);
	PrintChars("uword_2", uword_2);
	PrintChars("uword_3", uword_3);
	return 1;
}

And output is:

printing: utf8_string 0|73 (115) 1|6F (111) 2|6D (109) 3|65 (101) 4|20 (32) 5|D181 (53633) 6|D0BC (53436) 7|D0B5 (53429) 8|D188 (53640) 9|D0B0 (53424) 10|D0BD (53437) 11|D0BD (53437) 12|D18B (53643) 13|D0B9 (53433) 14|20 (32) 15|31 (49) 16|32 (50) 17|33 (51) 18|20 (32) 19|D182 (53634) 20|D0B5 (53429) 21|78 (120) 22|74 (116) printing: unicode_string 0|73 (115) 1|6F (111) 2|6D (109) 3|65 (101) 4|20 (32) 5|441 (1089) 6|43C (1084) 7|435 (1077) 8|448 (1096) 9|430 (1072) 10|43D (1085) 11|43D (1085) 12|44B (1099) 13|439 (1081) 14|20 (32) 15|31 (49) 16|32 (50) 17|33 (51) 18|20 (32) 19|442 (1090) 20|435 (1077) 21|78 (120) 22|74 (116) printing: uword_1 0|73 (115) 1|6F (111) 2|6D (109) 3|65 (101) printing: uword_2 0|41 (65) 1|3C (60) 2|35 (53) 3|48 (72) 4|30 (48) 5|3D (61) 6|3D (61) 7|4B (75) 8|39 (57) printing: uword_3 0|42 (66) 1|35 (53) 2|78 (120) 3|74 (116)

As you can see the Unicode-string' symbols just truncated to 1 byte width.

MuthaX avatar Mar 29 '22 19:03 MuthaX