kaitai_struct_compiler
kaitai_struct_compiler copied to clipboard
Fixes/standardizes the behavior of a repetition '_index' within a 'repeat-until' expression
Defines/Standardizes the Behavior of a Repetition _index
within a repeat-until
Expression
NOTE: This PR standardizes the previously-undefined behavior of _index
within repeat_until
.
Fixes issue #958. Because this standardizes something that was previously undefined, this PR may
warrant further discussion beyond the scope of a typical code review.
Standardizing this Behavior
Need
Issue #958 presents a basic scenario which warrants using (and therefore defining the behavior of)
_index
within a repeat-until
expression. Generally speaking, any data structure which contains
a list of entries of a variable-size N
that is capable of terminating early (before N
items have been processed),
benefits from this functionality.
Possible Behaviors
Inclusive _index
~~(Implemented in this PR)~~
'Inclusive' means that _index
gives the total number of elements processed plus the current entry.
Thus, for the first item processed in a repeat-until
block, _index
will yield a value of 1.
This is in contrast to the typical evaluation of _index
outside a repeat-until
expression.
~~However, I believe this treatment of _index
within a repeat-until
expression is justified (and, in-general, the expected behavior) - see the Reasoning for Inclusive Evaluation section.~~
Exclusive _index
'Exclusive' means that _index
gives the total number of elements processed without counting the current entry.
Thus, for the first item processed in a repeat-until block, _index
will yield a value of 0.
This is the typical behavior of _index
in expressions outside repeat-until
blocks.
Current Behavior
Currently, _index
within repeat-until
blocks is evaluated differently depending on the language target, see issue #958.
-
C++:
_index
is inclusive -
C#:
_index
is inclusive -
Go:
_index
is inclusive both within and outsiderepeat-until
expressions. This is unlike every other language target where expressions (outsiderepeat-until
) treat_index
as exclusive (as they should). -
Java:
_index
is inclusive -
Javascript:
_index
is inclusive -
Lua:
_index
is exclusive -
Nim:
_index
is exclusive -
Perl: Use of
_index
withinrepeat-until
expressions results in code that does not compile -
PHP:
_index
is inclusive -
Python:
_index
is exclusive -
Ruby:
_index
is inclusive -
Rust: Use of
_index
withinrepeat-until
expressions results in code that does not compile
~~Reasoning for Inclusive Evaluation~~
NOTE: This PR now implements exclusive evaluation. See the discussion in the comments.
Typically, _index
is evaluated exclusively. This way, _index
gives the current index of the
entry being processed. My argument, however, is that by the time the condition within a
repeat-until
expression is being evaluated, the current item is already considered to be
processed.
This is supported by the fact that the contents of the current item
can be referred to by the _
reference within a repeat-until
expression - further reinforcing
that the current item is already be considered to have been 'processed' when evaluating
repeat-until
expression.
Inclusive evaluation is also consistent with most implementations of a 'do while' loop, a
structure to which a repeat-until
block is roughly analogous.
Finally, inclusive evaluation will provide coverage for the most common use case:
where the maximum number of repetitions (N
) is known, but has the potential to terminate early. Exclusive evaluation would require that the condition _index >= N - 1
is tested, rather than the cleaner _index >= N
.
Hey, thanks for looking into this and writing a detailed analysis and fix! Officially supporting _index
inside repeat-until
makes sense, especially because in kaitai_struct_formats we already have two specs that use this feature. In fact, these two specs apparently already expect two different meanings for _index
, so there's definitely a need to standardize this.
I don't quite follow the argument for the "inclusive" _index
behavior though. You are right that
by the time the condition within a
repeat-until
expression is being evaluated, the current item is already considered to be processed
But the repeat-until
condition is still evaluated in the context of that already processed item. In particular, the "current" variable _
refers to the item that was just parsed, and not the upcoming item that has not been parsed yet (obviously that would be quite useless). So I'd argue that _index
should match that and should be the index of the just parsed item, not the next one. IMO it would be counterintuitive to have _index
be something different than the index of _
in the array.
I do see the advantage that the "inclusive" behavior matches what you would get in a C-style loop, where the index is incremented before the loop condition is evaluated. This indeed also makes it easier to write a maximum array size in a repeat-until
condition. But I'm not sure if it's worth making _
and _index
inconsistent to achieve this.
Great find on the two specs which employ this already!
I've been thinking about this some more...
The repeat
functionality is quite limited currently. If this functionality is improved in the future such that its possible to specify whether a repeat
block is inclusive/exclusive and/or when the loop condition is evaluated (perhaps some sort of repeat-while
?) there may be new, cleaner ways which achieve the desired behavior (that being, terminating a variable-length list early).
If this is the case, perhaps _index
should remain exclusive to keep behavior consistent across expressions. This way, current developers have a (rather hacky) means of accomplishing this behavior, while leaving room for more elegant solutions to be delivered (perhaps ones that allow the user to specify exclusive/inclusive evaluation) in the future without breaking backward compatibility.
TLDR:
Maybe _index
should be exclusive.
Whatever the result, the behavior of _index
within repeat-until
should be documented in the user guide (possibly alongside an example).
I think that _index
should be exclusive, and if one needs _index + 1
as a count of already processed elements, he should be able to use _count
, and if that _count
is used, KSC should emit the code computing it.
I think that
_index
should be exclusive, and if one needs_index + 1
as a count of already processed elements, he should be able to use_count
, and if that_count
is used, KSC should emit the code computing it.
Introduction of a further parameter _count
is beyond the scope of this PR (at least, in my opinion). In general, I'm skeptical of introducing another parameter whose definition is simply _index + 1
. However, this is just my initial impression.
My suspicion is that many of the problems of the current representation could be addressed by introducing needed functionality into a repeat
block (perhaps something as you have mentioned here).
I would love to see a discussion between the senior maintainers and the user-base addressing the limitations of the repeat
functionality sometime in the future.
@dgelessus:
But the
repeat-until
condition is still evaluated in the context of that already processed item. In particular, the "current" variable_
refers to the item that was just parsed, and not the upcoming item that has not been parsed yet (obviously that would be quite useless). So I'd argue that_index
should match that and should be the index of the just parsed item, not the next one. IMO it would be counterintuitive to have_index
be something different than the index of_
in the array.
100% agree. I also think that _index
should be exclusive in repeat-until
- in other words, _
should be the same as items[_index]
. If the repeat-until
condition also depends on a previous element for example, that will be items[_index - 1]
(of course this can be used only for _index >= 1
, because there is no previous element for _index == 0
). (Needing previous item in repeat-until
appeared on Gitter and I came up with the solution where items[_index - 2]
is the previous element and items[_index - 1]
the current one because that's how the JavaScript code backing the Web IDE behaved, but it's certainly confusing and counter-intuitive.)
With the "inclusive" _index
, items[_index]
is undefined, because it would reference the future element - if we choose this behavior, there's a question what _index
actually stands for when it can't be used as a subscript of the array. Oh, it makes emulating repeat-expr
more "elegant" just by using repeat-until: _index == num_items
(instead of repeat-until: _index == num_items - 1
for exclusive _index
)? Once again, this looks weird to me too, because intuitively _index
is an index to the array (if _index
isn't an index, I don't know what it is) - it should be always in range 0 <= _index <= num_items - 1
, so _index == num_items
doesn't make sense and should never happen.
If you want to emulate repeat-expr
using repeat-until
(which may be useful if the repetition can be terminated before reaching the desired count when a special condition is fulfilled, as you describe in https://github.com/kaitai-io/kaitai_struct/issues/958), you'll want to finish on the last element, which has the index num_items - 1
, so repeat-until: _index >= num_items - 1
(or _index == num_items - 1
if you're sure that you won't somehow miss the specific index, which can be safely assumed here, because _index
is always increased by 1 and the end condition is checked in each iteration).
Looking at your list in https://github.com/kaitai-io/kaitai_struct_compiler/pull/245#issue-1191448714, this decision may break existing specs relying on the inclusive behavior, but I don't think there are that many specs using _index
in repeat-until
at all, so it won't matter so much. And we inevitably break some compatibility just by standardizing previously various/undefined behavior, so why not choose the one that makes more sense and will be less confusing to users (in my opinion).
@generalmimon
100% agree. I also think that
_index
should be exclusive inrepeat-until
- in other words,_
should be the same asitems[_index]
. If therepeat-until
condition also depends on a previous element for example, that will beitems[_index - 1]
(of course this can be used only for_index >= 1
, because there is no previous element for_index == 0
). (Needing previous item inrepeat-until
appeared on Gitter and I came up with the solution whereitems[_index - 2]
is the previous element anditems[_index - 1]
the current one because that's how the JavaScript code backing the Web IDE behaved, but it's certainly confusing and counter-intuitive.)
I failed to consider the more common use of _index
within a repeat-until
block (use as an actual index e.g., items[_index - 1]
) in my initial reasoning.
I now agree exclusive evaluation makes more sense and modified the PR to implement this. More considerable alterations were needed to achieve this result so I'm in the process of setting up the test suite and confirming proper operation.
#787 is related documenting issue