pycrate Constructed object extended component's length determinant check problem in UPER

Hi, I have encountered a problem with the length determinant check in Open Type when they include types with extensions in UPER. Suppose we have the following ASN.1 definition,

    S1 ::=    SEQUENCE {
    t0  SEQUENCE {
      e1 BOOLEAN,
      ...,
      [[ e2  S2 OPTIONAL ]]
    },

    t1 BOOLEAN
    }

    S2 ::=    SEQUENCE {
    t0 BOOLEAN,
    ...,
    t1 BOOLEAN OPTIONAL
    }

And a correct encoding for the value v0 = {"t0" : {"e1" : True, "e2" : {"t0" : True}}, "t1" : True} should be c0406820. But if we alter the encoding's field for the length determinant of the extension group containing e2, and feed it back to the decoder, it would not run into a length determinant error (which it should). Say we provide an input of c040a82000 (the change is shown below), pycrate will still accept the message and decode it back to v0 (here some zero paddings are needed for the incorrect encoding).

1 | 1 | 0000000 | 1 | 00000001 | xxxxxxxx | 1
                            10

Is there any problem with the length determinant checking process? Thank you.

Mar 13 '25 15:03 zhouxt1

I've not had any problem with length determinant in UPER so far. The case you describe in ASN.1 looks however very specific, with an optional field in a sequence in an extension group with only optional fields in another sequence. But I'm sorry I don't understand exactly the issue you are facing. Could you provide some Python code snippet, that would help me to understand it?

Mar 13 '25 20:03 mitshell

Hi let me give you a more detailed description. Suppose we have the following ASN.1 definition, and we compile it using the pycrate compiler.

Foo DEFINITIONS AUTOMATIC TAGS ::= BEGIN
    S1 ::=    SEQUENCE {
    t0  SEQUENCE {
      e1 BOOLEAN,
      ...,
      e2  S2 OPTIONAL
    },

    t1 BOOLEAN
    }

    S2 ::=    SEQUENCE {
    t0 BOOLEAN,
    ...,
    t1 BOOLEAN OPTIONAL
    }
END

Let's say we compiler it to test.py. Then in consider the following code,

import test
g = test.Foo.S1

# we can encode some value of type S1
v = {"t0" : {"e1" : True, "e2" : {"t0" : True}}, "t1" : True}

r1 = g.to_uper(vg)
print( r1.hex() )
# here it will print 'c0405020' which is the correct encoding of v

# a melformed encoding where length determinant for the extension `e2` is altered
r2 =  bytes.fromhex('c040902000')

g.from_uper ( r2 )
print (g())
# will decode r2 successfully and print
# ext is : {'t0': {'e1': True, 'e2': {'t0': True}}, 't1': True}

In message r2, the changed part is in the extension S1->t0->e2 field. More specifically, the length determinant for the open type is changed from 1 to 2. But pycrate didn't raise any error of the incorrect length determinant.

Mar 14 '25 02:03 zhouxt1

OK, I get it:

In [24]: g.set_val(v)

In [25]: g.to_uper_ws().hex()
Out[25]: 'c0405020'

In [26]: print(g._struct.show())
### S1 ###
 ### t0 ###
  <E : 1>
  ### e1 ###
   <V : 1 (TRUE)>
  <big : 0>
  <C : 0>
  <B : 0b1>
  <C_form : 0 (short)>
  <C : 1>
  ### S2 ###
   <E : 0>
   ### t0 ###
    <V : 1 (TRUE)>
   <P : 0b000000>
 ### t1 ###
  <V : 1 (TRUE)>
 <P : 0b00000>

In [27]: g.from_uper_ws(bytes.fromhex('c040902000'))

In [29]: g() == v
Out[29]: True

In [30]: print(g._struct.show())
### S1 ###
 ### t0 ###
  <E : 1>
  ### e1 ###
   <V : 1 (TRUE)>
  <big : 0>
  <C : 0>
  <B : 0b1>
  <C_form : 0 (short)>
  <C : 2>
  ### e2 ###
   <E : 0>
   ### t0 ###
    <V : 1 (TRUE)>
   <P : 0b000000>
 ### t1 ###
  <V : 1 (TRUE)>
 <P : 0b00000>

So, the 2nd buffer has a length determinant of 2, instead of 1, for the component e2, which is a SEQUENCE, while this field actually encodes to a single byte. I believe the decoder succeeds as there is a single BOOLEAN field in the e2 sequence, and just returns to the parent object without further checking for remaining length. But I'll double check.

Do you consider the decoder should stop decoding the buffer and raise? Or just log a warning? Do you know if the ITU-T X.691 on PER encoding provides any specific guidance or rule about this case?

Mar 15 '25 00:03 mitshell

I think that the decoder should raise. Because pycrate will check the number of remaining bytes in some sense. In this case, if we feed the parser c0409020 as input, it will raise an error pycrate_asn1rt.err.ASN1PERDecodeErr: length determinant too long. Also, in another case where the length determinant is too short, it would raise error bitlen overflow.

And since extensions are designed for backward/forward compatibility. Suppose we have this older version of the definition of S1, say S3, where e2 is not defined yet. It is defined as

S3 ::=    SEQUENCE {
  t0  SEQUENCE {
    e1 BOOLEAN,
    ...} ,
  t1 BOOLEAN
}

We give it the same input,

>>> f = test.Foo.S3
>>> f.from_uper_ws(bytes.fromhex('c040902000'))
>>> f()
{'t0': {'e1': True, '_ext_0': b'@\x80'}, 't1': False}

In this example, we can see that it would correctly skip two bytes. But this results in t1 being parsed into a different value.

Also, I tested the same case on ASN1c, they returned an error.

For X.691, it only talks about how to encode an Open Type but doesn't say very specifically about the decoding procedure.

Mar 15 '25 04:03 zhouxt1