lingua-franca icon indicating copy to clipboard operation
lingua-franca copied to clipboard

Normalizer mishandles "X%.", returns "X %."

Open ChanceNCounter opened this issue 3 years ago • 7 comments

normalize("Set Volume to 50%.") -> "Set Volume to 50 %."

This is bad. It should probably, at worst, return "Set Volume to 50 % ."

ChanceNCounter avatar May 17 '21 19:05 ChanceNCounter

Hi @ChanceNCounter I would like to work on this issue. As this would be my first contribution to this project, I'll complete the steps required to become a contributor and submit a PR shortly. :)

Badboy-16 avatar Jun 04 '21 15:06 Badboy-16

Sounds good! I think it should ideally maintain the percentage as such, meaning that when the normalized phrase is passed to a tokenizer, one of the tokens should be "50%". But that's my opinion.

In the long run, the oddness of the current behavior aside, there might be a design choice to be made here: @krisgesling, what are your thoughts on the extractors and percentages?

ChanceNCounter avatar Jun 04 '21 19:06 ChanceNCounter

Yeah agreed - the % is inherently tied to the number eg it's not the same as "50 apples", if anything it's closer to "0.5".

Thanks for digging into this @Badboy-16 :)

krisgesling avatar Jun 09 '21 02:06 krisgesling

since the point of normalize was making intent parsing etc easier, this just makes it harder to detect numbers or percentages, eg, a voc file containing "percent" and "%" will no longer match in adapt, any downstream that is depending on tokens being number words might also suddenly fail

this change was intentionally part of normalization process

JarbasAl avatar Jun 11 '21 13:06 JarbasAl

this change was intentionally part of normalization process

Okay but the current state of affairs is unacceptable.

ChanceNCounter avatar Jun 11 '21 15:06 ChanceNCounter

then normalize the symbol into a word

JarbasAl avatar Jun 11 '21 16:06 JarbasAl

I think we might be talking about different things here. The periods in the issue title are literal.

The normalizer handles "5%" correctly. It mishandles "5%.", returning "5 %."

"%." is nothing.

ChanceNCounter avatar Jun 11 '21 23:06 ChanceNCounter