gherkin: Duplicate keywords in translations
Summary
In Behat/Gherkin we use the cucumber i18n data for translations. We have some tests that use this data to:
- Generate a gherkin string using the different variations of the keywords
- Generate a parsed AST that corresponds to that feature
- Check the parser creates a similar AST when consuming the gherkin
This tends to fail when there are duplicate keywords inside a language's translation, so we skip some languages when doing this test.
Currently we are excluding the following from the tests:
ne:अनीmeans bothAndandGivenuz:Агарmeans bothGivenandWhenen-old:Thameans bothWhenandThen
Impact
Aside from failing tests easily skipped, in Behat our AST retains the 'type' of the node, which in some cases will now be wrong vs what the author intended
Maybe more importantly, in those languages there is a word that means two different things. This may damage education efforts around the G/W/T structure.
Possible Solution
- fix the languages listed above to be unambiguous - this would have backwards compatibility issues so might not be acceptable
- ensure via automated tests that duplicated words cannot be added to new or existing translations
I agree we should remove duplicates. We fixed a similar issue in https://github.com/cucumber/cucumber/commit/a54f32dd01d677ed32c5595ebf1c7aeec9283140 - see https://github.com/cucumber/cucumber/blob/master/gherkin/CHANGELOG.md#1501---2020-08-12 where we removed keywords that only differed by case.
We need to bump the major version when/if we fix this, but that's no big deal.
As the keyword would keep working for a step, I'm not even sure this creates a BC issue. They would still be usable for steps (as you would only remove one of the occurrences of the duplicate, not both). And to preserve BC with step type, it would be a matter of keeping the one corresponding to the step type being detected currently by cucumber implementations (but I think this is not even consistent between implementations, and may not cause issues)
@aslakhellesoy isn't this solved with the #1741 merge?