kaitai_struct icon indicating copy to clipboard operation
kaitai_struct copied to clipboard

Is it possible to store the current parse index only?

Open ams-tschoening opened this issue 6 years ago • 5 comments

I have some byte[] containing various different types and don't necessarily know in which order those types are present in the stream. At least some of those types are encrypted and I need to decrypt them by copying the byte[] in whole and decrypt the individual encrypted types in place. If a type is encrypted is indicated using flags and because my overall result is the whole record of individual types unencrypted, I need to switch those flags to show unencrypted state.

The problem I have is to know the position of the bytes to change in the copied byte[]. In former versions I was able to calculate it depending on the available types in the stream, but that doesn't work anymore, as I get more and more types with different order than before etc.

Because I encounter the bytes I need during parsing from the front to the end, I thought of simply storing the current parse index in some instance. Yeah, it's not pretty safe, but the best solution I can think of currently. Is something like that supported? I could only think of the following workaround to implement that manually:

seq:
  - id:   access_nr
    type: u1

  - id:   status
    type: fmt_oms_tpl_status

  - id:   calc_config_field_start_idx
    size: 0
    if:   config_field_start_idx

  - id:   config_field
    type: fmt_oms_tpl_config_field

instances:
  config_field_start_idx:
    value: _io.pos

config_field_start_idx is executed always and only stores the current index, without actually parsing anything, so I don't need to reset things. But is there some easier way to achieve this I'm missing?

Thanks!

ams-tschoening avatar Mar 13 '18 11:03 ams-tschoening

Generally, what you're asking here is some sort of stateful parsing with backtracking, i.e. when you try one branch of parsing and after that (for example, if it fails), you want another branch, retrying from a previously memorized position.

This is generally not what you'd want to write into ksy, i.e. you'd want to write declarative data structures, not imperative parsing algorithm. Current way to do "multiple parsing attempts" in ksy would be isolating piece to be parsed as a substream, and then use several different instances to operate on that substream, i.e., something like that:

seq:
  - id: config_field
    type: dummy
    size: 0x100
instances:
  config_field_as_type1:
    io: config_field._io
    pos: 0
    type: type1
  config_field_as_type2:
    io: config_field._io
    pos: 0
    type: type2
types:
  dummy: {} # or even something meaningful here
  type1: 
    # ...
  type2: 
    # ...

The devil is, of course, in the details. You need to somehow understand the size of this "isolated portion", or at least make sure that size-eos: true is ok. You need some way to switch between these several "attempted" parsings, and recover if some parsing would go astray. More likely that not, we need some way to check data and report errors (i.e. #81), and some way to capture these errors and retry with another options.

The hack that you're using is fine as a workaround, but it's obviously it's stateful and not very declarative as well.

GreyCat avatar Mar 13 '18 18:03 GreyCat

"multiple parsing attempts"

That is really not what I need, I only need to change individual bytes in some arbitrary byte[] in some higher level of my app. Those bytes are e.g. flags specifying encryption modes and after I decrypted data, I simply need to reset those flags.

The approach I'm currently taking is trying to avoid serializing parsed data again by simply copying the whole byte[] already available to KS and asking KS for special indexes of my flags and the beginning/end position of the data to decrypt it in place.

KS knows all that already because if it parses things from beginning to end it simply passes the flags I need and I only need to remember their positions and make them available. The problem is that those positions can not be known beforehand, only KS can know them after parsing. Additionally, everything that KS knows about the data is for benefit for all target languages, so I make my parser pretty verbose anyway.

Think of it as a GIT tag compared to branches, really only a name for the current parse position to remember, because that has a special meaning to someone. But this is really special, so might not be worth it to think about it any longer.

ams-tschoening avatar Mar 13 '18 19:03 ams-tschoening

I would like to reopen this issue for (a bit of) further discussion and would start with making clear that things are really easier then I might have described: I only need to store some very special parsing index either before or after some seq-attribute has been parsed. This is pretty much only the same kind of state possible with instances already anyway. The stored position should then be used everywhere a pos is supported by KS currently already or might be used by some higher level of an app.

I'm only thinking of something like auto-generating a simple value instance providing the stored position for some seq-attribute. That instance can than be used by parsing instances and their pos-attribute automatically and because it's an instance, is available to all higher levels of an app as well. It's something like the following in KSY:

  - id:   config
    type: fmt_oms_tpl_config
    pos-store:      config_start_idx
    pos-store-when: before|after

pos-store tells the compiler to auto-generate some value instance with the given name, *-when tells which position should be stored, either before or after the seq-attribute has been parsed. Maybe some more generic store-* would make sense as well.

The reason why I though again of this is #546. I'm using a type there to store things so that I can use some parse instance in combination with _io of the created type and absolute positions within that stream I know, which I don't know in the original parent stream. Except if I could tell KS to store some special "current" positions it encounters on its way parsing the stream anyway.

ams-tschoening avatar Mar 28 '19 08:03 ams-tschoening

@ams-tschoening Does lea(config) based on #84 meet your needs? If so, i think we can close this as a duplicate.

Omar-Abdul-Azeez avatar Jan 14 '23 11:01 Omar-Abdul-Azeez

lea() returns an offset of the struct making the scope. It is the stream position where the value of this type have been started parsing. lea(property_name) returns an offset of the property which name is passed. It is either the stream position where the property have been started parsing, or predicted using known sizes of already parsed fields.

Things read like they are what I need, but don't have the time to dig further into this right now anyway. So feel free to close as duplicate, things can always be reopened otherwise in the future.

ams-tschoening avatar Jan 14 '23 11:01 ams-tschoening