pycrate icon indicating copy to clipboard operation
pycrate copied to clipboard

Problem with Decoding of UTF8String in UPER

Open zhouxt1 opened this issue 7 months ago • 2 comments

Hi, I found this potential inconsistency between encoding and decoding. When I define a utf8string type with a length constraints in UPER, the decoder could not decode the value encoded by the encoder. For example, if I define

S1 ::= SEQUENCE {
        protocolId UTF8String (SIZE(1..4))
}

Then if I provide the value, {"protocolId" : "aββ"}, i can successfully encode it to d873acb3ac80 in hex. But decoding this value results in an error

pycrate_asn1rt.err.ASN1PERDecodeErr: S1.protocolId: invalid character, Python codec error, 'utf-8' codec can't decode byte 0xce in position 3: unexpected end of data

Other similar issue is for example, when I encode the value βββ, I get the string f3acb3acb3ac80 . But the decoded value is ββ instead of the original value.

To solve this, X.691 30.6, it says, This subclause applies to character strings that are not known-multiplier character strings. In this case, constraints are never PER-visible, and the type can never be extensible for PER encoding. where UTF8String is a not known-multiplier character string. So to encode/decode it correctly, the size constriant (SIZE (1..4)) should be ignored.

Could you look into this problem? Thanks!

zhouxt1 avatar May 01 '25 19:05 zhouxt1

Thanks for the report. The latest commits are providing a fix and test case: https://github.com/pycrate-org/pycrate/commit/389c0e5aefb67600a3276865da542147721b7623 This covers SIZE constraints for sure, but I'm unsure this would cover alphabet constraints (which I never experienced for UTF8String however).

mitshell avatar May 03 '25 15:05 mitshell

In X.691 30.1, it says: The following restricted character string types are known-multiplier character string types: NumericString, PrintableString, VisibleString (ISO646String), IA5String, BMPString, and UniversalString. Effective permitted-alphabet constraints are PER-visible only for these types.

Since UTF8String is not a known-multiplier, I think we can just ignore the aphabet constraints.

zhouxt1 avatar May 05 '25 03:05 zhouxt1

Do you consider another fix is required? Or the one provided is OK? If yes, you can just close the issue. Thx

mitshell avatar Jun 24 '25 19:06 mitshell