Problem with Decoding of UTF8String in UPER
Hi, I found this potential inconsistency between encoding and decoding. When I define a utf8string type with a length constraints in UPER, the decoder could not decode the value encoded by the encoder. For example, if I define
S1 ::= SEQUENCE {
protocolId UTF8String (SIZE(1..4))
}
Then if I provide the value, {"protocolId" : "aββ"}, i can successfully encode it to d873acb3ac80 in hex. But decoding this value results in an error
pycrate_asn1rt.err.ASN1PERDecodeErr: S1.protocolId: invalid character, Python codec error, 'utf-8' codec can't decode byte 0xce in position 3: unexpected end of data
Other similar issue is for example, when I encode the value βββ, I get the string f3acb3acb3ac80 . But the decoded value is ββ instead of the original value.
To solve this, X.691 30.6, it says,
This subclause applies to character strings that are not known-multiplier character strings. In this case,
constraints are never PER-visible, and the type can never be extensible for PER encoding.
where UTF8String is a not known-multiplier character string. So to encode/decode it correctly, the size constriant (SIZE (1..4)) should be ignored.
Could you look into this problem? Thanks!
Thanks for the report. The latest commits are providing a fix and test case: https://github.com/pycrate-org/pycrate/commit/389c0e5aefb67600a3276865da542147721b7623 This covers SIZE constraints for sure, but I'm unsure this would cover alphabet constraints (which I never experienced for UTF8String however).
In X.691 30.1, it says: The following restricted character string types are known-multiplier character string types: NumericString, PrintableString, VisibleString (ISO646String), IA5String, BMPString, and UniversalString. Effective permitted-alphabet constraints are PER-visible only for these types.
Since UTF8String is not a known-multiplier, I think we can just ignore the aphabet constraints.
Do you consider another fix is required? Or the one provided is OK? If yes, you can just close the issue. Thx