dragonfly icon indicating copy to clipboard operation
dragonfly copied to clipboard

Repetition with CompoundRule

Open cjbassi opened this issue 6 years ago • 14 comments

Is it possible to use repetition within the CompoundRule? Is there some syntax that can be used in the spec to achieve this? Is there any documentation for this that i missed? Thanks!

cjbassi avatar Apr 18 '20 16:04 cjbassi

Not currently. There has been some discussion of it in the past on Gitter (most of which I'm afraid I have forgotten). The question of how to design it is non trivial. It is definitely worth discussing here, as it is a pain point currently!

daanzu avatar Apr 19 '20 04:04 daanzu

Could you give an example of what you're trying to do?

mirober avatar Apr 19 '20 12:04 mirober

I'm trying to create a compound rule where certain words can be repeated. For my use case, the syntax that I would most prefer is word* for repetitions of 0 or more and word+ for repetitions of 1 or more. Although maybe we need to instead specify the repetition in the extras, so that you can be more precise with specifying the repetition parameters. Also, it needs to work with other extras like Choice.

cjbassi avatar Apr 19 '20 15:04 cjbassi

Ahh I see. You can obviously use a Repetition to create a new extra which does this but there is no syntax at the moment, though it would be possible.

* would need to produce Optional(Repetition(object)) and + Repetition(object). Hard part is that we would probably also need to do some trickery with the names so that you could refer to both the repeated version and the normal version of something with the same name.

mirober avatar Apr 19 '20 16:04 mirober

Oh you can specify it as an extra already? Is there some documentation for this that I missed? I specifically need it to work with a Choice extra, if that's possible. Thanks for the help.

cjbassi avatar Apr 19 '20 16:04 cjbassi

Ah yeah you're right, there is a docstring for it but I can't find it in the docs. Basically it's just another extra you can wrap around others, couple of example usages:

# Matches between one and four integers
Repetition(IntegerRef("", 0, 10), min=1, max=5, name="num_seq"),
# Matches between one and three alphabet characters in a Choice
Repetition(Choice("", {...}), 1, 4, "alphabet_seq"),

It will return a list of the matched elements, and you can specify a default as normal if it's going to be optional.

Docs: https://github.com/dictation-toolbox/dragonfly/blob/master/dragonfly/grammar/elements_basic.py#L563

mirober avatar Apr 19 '20 16:04 mirober

I got it to work based on the examples you provided, so thanks for that. At this point, the only question here is whether the CompoundRule should provide syntax support for * and +. I'm not really sure i have an opinion on this, but I think it might be nice.

cjbassi avatar Apr 20 '20 20:04 cjbassi

Use of Repetition elements, either for continuous command recognition (CCR) or for more simple use cases, is currently poorly documented.

@mrob95 Would you mind if your examples were included in the docstring for the Repetition element class?

At this point, the only question here is whether the CompoundRule should provide syntax support for * and +. I'm not really sure i have an opinion on this, but I think it might be nice.

As @daanzu said, this has come up in the past and has unfortunately been forgotten about. I would be in support of adding this syntax to Compound specs. It would be nice to have!

drmfinlay avatar Apr 23 '20 14:04 drmfinlay

At this point, the only question here is whether the CompoundRule should provide syntax support for * and +. I'm not really sure i have an opinion on this, but I think it might be nice.

I think this could come in handy fairly often. The main design question is how to handle the extras: does the extras variable get assigned an array of all of the matches? I seem to remember coming up with various possibilities, but I haven't considered it lately.

daanzu avatar Apr 24 '20 13:04 daanzu

I've already implemented this in an application I'm writing: https://github.com/osprey-voice/osprey/blob/master/osprey/voice.py#L138, and the way I've implemented it is both * and + return an array of the matches. * is implemented with an Optional so I just set it's default to [].

cjbassi avatar Apr 24 '20 15:04 cjbassi

Is it possible to have an unbounded number of repetitions? Otherwise we are going to have to put a limit on the number of repetitions for * and +. If that's the case, it might be better to not implement this feature in dragonfly since it would be opinionated.

cjbassi avatar May 22 '20 23:05 cjbassi

Is it possible to have an unbounded number of repetitions?

For engines that support it (I think Kaldi and natlink), the optimize parameter makes the number unbounded, and ignores the min and max.

daanzu avatar May 23 '20 02:05 daanzu

That didn't work for me unfortunately. It still checks max which is 1 by default. Maybe we should add another repetition parameter called unbounded, which makes it so that it doesn't check max.

cjbassi avatar May 25 '20 16:05 cjbassi

Most regex languages support using {min,max} or {,max} notation for bounded repetitions, in the short term that fits the model already in place. As for dragonfly, unbounded wouldnt be hard to add to repetition. Alternatively, there is a natural bound on repetitions, namely the number of recognized words (except in the trivial repetition of empty element). Thus we dont need to worry about infinite loops while decoding.

Perhaps this should be a separate issue, but I should also point out that the dfly convention of making min inclusive and max exclusive in repetitions is both nonstandard compared to regex languages and very confusing. Reading the documentation saying the default is max=min+1 hurts my head :(

kb100 avatar Jul 23 '20 22:07 kb100