common icon indicating copy to clipboard operation
common copied to clipboard

gherkin: Duplicate keywords in translations

Open ciaranmcnulty opened this issue 4 years ago • 3 comments

Summary

In Behat/Gherkin we use the cucumber i18n data for translations. We have some tests that use this data to:

  • Generate a gherkin string using the different variations of the keywords
  • Generate a parsed AST that corresponds to that feature
  • Check the parser creates a similar AST when consuming the gherkin

This tends to fail when there are duplicate keywords inside a language's translation, so we skip some languages when doing this test.

Currently we are excluding the following from the tests:

  • ne: अनी means both And and Given
  • uz: Агар means both Given and When
  • en-old: Tha means both When and Then

Impact

Aside from failing tests easily skipped, in Behat our AST retains the 'type' of the node, which in some cases will now be wrong vs what the author intended

Maybe more importantly, in those languages there is a word that means two different things. This may damage education efforts around the G/W/T structure.

Possible Solution

  1. fix the languages listed above to be unambiguous - this would have backwards compatibility issues so might not be acceptable
  2. ensure via automated tests that duplicated words cannot be added to new or existing translations

ciaranmcnulty avatar Feb 01 '21 15:02 ciaranmcnulty

I agree we should remove duplicates. We fixed a similar issue in https://github.com/cucumber/cucumber/commit/a54f32dd01d677ed32c5595ebf1c7aeec9283140 - see https://github.com/cucumber/cucumber/blob/master/gherkin/CHANGELOG.md#1501---2020-08-12 where we removed keywords that only differed by case.

We need to bump the major version when/if we fix this, but that's no big deal.

aslakhellesoy avatar Feb 01 '21 20:02 aslakhellesoy

As the keyword would keep working for a step, I'm not even sure this creates a BC issue. They would still be usable for steps (as you would only remove one of the occurrences of the duplicate, not both). And to preserve BC with step type, it would be a matter of keeping the one corresponding to the step type being detected currently by cucumber implementations (but I think this is not even consistent between implementations, and may not cause issues)

stof avatar Feb 17 '21 10:02 stof

@aslakhellesoy isn't this solved with the #1741 merge?

ehuelsmann avatar May 25 '22 16:05 ehuelsmann