kaitai_struct icon indicating copy to clipboard operation
kaitai_struct copied to clipboard

Table lookups instead of enums?

Open larsks opened this issue 4 years ago • 13 comments

I'm reading in some data that represents certain values using what is essentially a table lookup. That is, the data may contain data in the range 0..5, and each value is effectively an index into a list of values, such as [118, 144, 220, 300, 430, 1200]. That is, if the data contains 1, that corresponds to a value of 144.

I would love to be able to represent this in kaitai in a manner similar to enums. Something like:

tables:
  frequency:
    0: 118
    1: 144
    2: 220
    3: 300
    4: 430
    5: 1200
seq:
  - id: frequency
    type: u1
    table: frequency

The only alternative I see right now is to create a enum-based solution, like:

enums:
  frequency:
    0: freq_118
    1: freq_144
    2: freq_220
    3: freq_300
    4: freq_430
    5: freq_1200

...but while that leads to better readability, it doesn't actually help all that much because I would still need to maintain an external mapping table to map the enum to an actual value.

larsks avatar Oct 05 '19 17:10 larsks

In this particular example, you can technically get away with a lookup in an array literal, i.e.

instances:
  freqs:
    value: '[118, 144, 220, 300, 430, 1200]'
  freq_decoded:
    value: freqs[frequency]

GreyCat avatar Oct 06 '19 01:10 GreyCat

I thought about that, but there are situations where I would like to use the same table in multiple places. I guess I could make it some sort of top-level value and then use value: _root.tables.freq[frequency].

larsks avatar Oct 06 '19 02:10 larsks

You can see an implementation of that idea here and here.

larsks avatar Oct 06 '19 03:10 larsks

Could you explain in a little more detail how you use these values — i.e. are they just for presentation only, or you need them to continue parsing?

GreyCat avatar Oct 06 '19 15:10 GreyCat

They're not for "presentation", but neither are they required for further parsing. They are the real values that the numeric constants in the binary data represent: that is, they are the values that external code is going to care about. For example, if I want to configure an SDR to scan the transmission band to which a radio is currently tuned, I need to know that the radio is on the 144 Mhz band, not 1, which is what is stored in the radio memory.

I could just maintain the lookup tables in external code, but this means that anyone using the ksy file would need to re-implement the same lookup tables. Having them embedded in the ksy file means they only need to be implemented in one place, reducing the risk of weird typos or other inconsistencies.

larsks avatar Oct 06 '19 17:10 larsks

For LUTs it's more sensible to use switch-on-cases syntax.

KOLANICH avatar Oct 06 '19 17:10 KOLANICH

@KOLANICH I'm not really sure how that would look. Can you share an example? I mean, I've used switch-on for type switching, but I'm not sure how that would be used in place of the lookup tables I'm currently using.

larsks avatar Oct 10 '19 23:10 larsks

instances:
  freq_decoded:
    value:
      switch-on: frequency
      cases:
        0: 118
        1: 144
        2: 220
        3: 300
        4: 430
        5: 1200

KOLANICH avatar Oct 11 '19 06:10 KOLANICH

@KOLANICH Thanks! I had no idea you could do that. The user guide only shows switch-on in the context of selecting a type. I'll try that out.

larsks avatar Oct 11 '19 13:10 larsks

Oh wait, I think you're suggesting a possible future syntax to implement this! If that's the case, :+1: to that idea, because it would work really well.

larsks avatar Oct 11 '19 14:10 larsks

Note that the workaround I'm using (lookup tables as value instances) appears to be buggy, see #632.

larsks avatar Oct 11 '19 21:10 larsks

Lot of formats are created using TLV pattern. There are some generic container formats like RIFF, which have the same header, but type-value mapping is specialized by the formats utilising the header.

Having a global tag-chunk mapping can work, but it may be non-optimal.

So, it is proposed to encapsulate tag-type mapping into an object

lut:
  key: u1 # type of the key
  cases:
    0x1: b

and then pass it as a template argument

types:
  b:
    template:
      - id: type_map
        type: lut<u1, type>
    seq:
      - id: b
        type: u1
      - id: c
        switch-on: b
        cases: type_map

A type map is a lut having a type as its second component. For them factory functions should be generated, if needed. Regular switch-cases constructions should be considered anonymous type maps.

LUTs should allow operations like intersections and unions. The results should be kept as an abstract representation, not immediately merged. The way KSC implements lookup in a compound lut depends on the situation. I.e. for mapping with discontinious keys it can synthesize a perfect hash function. I.e. for a union KSC can generate lookups in 2 different tables consequentially. Or in some languages it can optimize by exploiting memory is continious, so just put them into a single array and address individually as parts of it. Or, if it considers it'd be cheaper (i.e. an intersection), it can merge them into a new full-fledged one, with a new perfect hash, if necessary.

KOLANICH avatar Jan 22 '22 07:01 KOLANICH

Also, should this issue be renamed into Look-up tables (LUTs)/mappings as first-class citizens?

KOLANICH avatar Jan 22 '22 08:01 KOLANICH