Cpanel-JSON-XS icon indicating copy to clipboard operation
Cpanel-JSON-XS copied to clipboard

new method force_utf8

Open rurban opened this issue 9 years ago • 6 comments

utf8->decode sets the expected type of the input string, but we are missing a method to enforce the utf8-ness of the result string(s). Thus any valid latin1 128-255 chars only will not set the utf8 flag. We might want to add this force_utf8 method to set the result utf8ness. This is currently only controlled by the use/no utf8 pragma, so we would need this method to be context insensitive.

Request by @brainbuz at YAPC::NA 2015, he will come up with a testcase or data.

rurban avatar Jun 07 '15 21:06 rurban

I put an example up at http://www.brainbuz.org/docs/cpanel_json_xs.tgz

The description field of the json file contains é and its ascii json escape version \u00e9 when you look at the output it comes out correct in one case and spaghetified in the other. Uncomment use utf8::all and both inputs for é will print correctly. I tried this on 5.20 and 5.22 on Debianized Linux (results may be different on other OSes).

brainbuz avatar Jun 19 '15 05:06 brainbuz

Is this basically asking that every string returned by decode be utf8::upgrade-d?

I have a similar problem, but I want all strings downgraded (with FAIL_OK true). See https://stackoverflow.com/questions/47893579/decode-json-without-utf8-flagged-strings

Maybe a new options upgrade, downgrade, and downgrade_fail_ok? Or a single option with a parameter that supports any of those? Or failing that, a filter_json_string like filter_json_object but called for every string (though I'd prefer a method name containing "hook", not "filter").

ysth avatar Dec 21 '17 19:12 ysth

It is a bug for a module to depend on strings being downgraded. I don't think it's worth complicating the interface of this module to work around that.

Grinnz avatar Dec 21 '17 19:12 Grinnz

No, it's only for the 128-255 range, where the outcome is ambiguous. But I agree, it looks a bit too complicated. Let's see if someone will come up with a PR

rurban avatar Dec 21 '17 19:12 rurban

Yes, it is certainly a bug for a module to depend on strings being downgraded. But sometimes you have to work around bugs. :)

An alternative to changing the interface (for my problem, not the problem called out in this bug) would be to make the binary option not croak on \u00xx as it does currently.

ysth avatar Dec 22 '17 17:12 ysth

Yes, it is certainly a bug for a module to depend on strings being downgraded.

I fully agree.

And if there is another buggy module, then that should be fixed or workarounded. Not JSON module just because something else is buggy. And also if another module is buggy and cannot be fixed then still user of that module can call utf8::upgrade/downgrade manually.

So I do not think that Cpanel::JSON::XS needs such new and buggy functionality.

pali avatar Jan 31 '18 08:01 pali