Repetition with CompoundRule
Is it possible to use repetition within the CompoundRule? Is there some syntax that can be used in the spec to achieve this? Is there any documentation for this that i missed? Thanks!
Not currently. There has been some discussion of it in the past on Gitter (most of which I'm afraid I have forgotten). The question of how to design it is non trivial. It is definitely worth discussing here, as it is a pain point currently!
Could you give an example of what you're trying to do?
I'm trying to create a compound rule where certain words can be repeated. For my use case, the syntax that I would most prefer is word* for repetitions of 0 or more and word+ for repetitions of 1 or more. Although maybe we need to instead specify the repetition in the extras, so that you can be more precise with specifying the repetition parameters. Also, it needs to work with other extras like Choice.
Ahh I see. You can obviously use a Repetition to create a new extra which does this but there is no syntax at the moment, though it would be possible.
* would need to produce Optional(Repetition(object)) and + Repetition(object). Hard part is that we would probably also need to do some trickery with the names so that you could refer to both the repeated version and the normal version of something with the same name.
Oh you can specify it as an extra already? Is there some documentation for this that I missed? I specifically need it to work with a Choice extra, if that's possible. Thanks for the help.
Ah yeah you're right, there is a docstring for it but I can't find it in the docs. Basically it's just another extra you can wrap around others, couple of example usages:
# Matches between one and four integers
Repetition(IntegerRef("", 0, 10), min=1, max=5, name="num_seq"),
# Matches between one and three alphabet characters in a Choice
Repetition(Choice("", {...}), 1, 4, "alphabet_seq"),
It will return a list of the matched elements, and you can specify a default as normal if it's going to be optional.
Docs: https://github.com/dictation-toolbox/dragonfly/blob/master/dragonfly/grammar/elements_basic.py#L563
I got it to work based on the examples you provided, so thanks for that. At this point, the only question here is whether the CompoundRule should provide syntax support for * and +. I'm not really sure i have an opinion on this, but I think it might be nice.
Use of Repetition elements, either for continuous command recognition (CCR) or for more simple use cases, is currently poorly documented.
@mrob95 Would you mind if your examples were included in the docstring for the Repetition element class?
At this point, the only question here is whether the CompoundRule should provide syntax support for
*and+. I'm not really sure i have an opinion on this, but I think it might be nice.
As @daanzu said, this has come up in the past and has unfortunately been forgotten about. I would be in support of adding this syntax to Compound specs. It would be nice to have!
At this point, the only question here is whether the CompoundRule should provide syntax support for
*and+. I'm not really sure i have an opinion on this, but I think it might be nice.
I think this could come in handy fairly often. The main design question is how to handle the extras: does the extras variable get assigned an array of all of the matches? I seem to remember coming up with various possibilities, but I haven't considered it lately.
I've already implemented this in an application I'm writing: https://github.com/osprey-voice/osprey/blob/master/osprey/voice.py#L138, and the way I've implemented it is both * and + return an array of the matches. * is implemented with an Optional so I just set it's default to [].
Is it possible to have an unbounded number of repetitions? Otherwise we are going to have to put a limit on the number of repetitions for * and +. If that's the case, it might be better to not implement this feature in dragonfly since it would be opinionated.
Is it possible to have an unbounded number of repetitions?
For engines that support it (I think Kaldi and natlink), the optimize parameter makes the number unbounded, and ignores the min and max.
That didn't work for me unfortunately. It still checks max which is 1 by default. Maybe we should add another repetition parameter called unbounded, which makes it so that it doesn't check max.
Most regex languages support using {min,max} or {,max} notation for bounded repetitions, in the short term that fits the model already in place. As for dragonfly, unbounded wouldnt be hard to add to repetition. Alternatively, there is a natural bound on repetitions, namely the number of recognized words (except in the trivial repetition of empty element). Thus we dont need to worry about infinite loops while decoding.
Perhaps this should be a separate issue, but I should also point out that the dfly convention of making min inclusive and max exclusive in repetitions is both nonstandard compared to regex languages and very confusing. Reading the documentation saying the default is max=min+1 hurts my head :(