Kleene star matches too greedily
The documentation describes the following sound change:
categories
C = p t k b d g m n ŋ f s h
Nasl = m n ŋ
VStp = b d g
V = a e i o u
end
VStp / Nasl / Nasl [C V -h]* _
; mate → mate (no change)
; matebede → matemene
; matebehede → matenehede
However, I get mate matebede matebehede (no changes) when I run it. Here's another example:
categories
C = p t k b d g m n ŋ f s h
V = a e i o u
end
[e o] / [i u] / _ [C V]* i
; meki -> miki
; meski -> miski
; meaki -> miaki
Result: meki meski meaki (no changes)
[e o] / [i u] / _ C* i works as expected, but changing C* to [C V]* breaks the rule.
Well spotted! I actually have automated tests which make sure all the examples work as expected, but on closer inspection this one had a typo which caused it to be skipped — and it’s just my bad luck that there was a bug lurking here.
I’ll have a closer look at it when I get time.
On further investigation, this isn’t technically a bug. In the second sound change (which is simpler, what’s happening here is that [C V]* matches all consonants and vowels including ⟨i⟩. So the sound change gets to the end of the world without actually matching i.
The second sound change can thus be fixed by rewriting it to [e o] / [i u] / _ [C V -i]* i. Similarly, the first one should be rewritten as VStp / Nasl / Nasl [C V -Vstp -h]* _. (To make it even more confusing, there was a typo in the sample output: I wrote ⟨matenehede⟩ for ⟨matemehede⟩.)
That being said, this behaviour has already been reported in #5. It’s sufficiently confusing that even I have gotten confused about it, now for the third time. So I’d better look into changing how the star works.
@Xwtek sent me yet another excellent example of this problem:
categories
-stress = a ə ɛ e i į ɔ o u ų
+stress = á ə́ ɛ́ é í į́ ɔ́ ó ú ų́
V = &&stress
C = m p b v n t d r ts dz s z ŋ k g ɣ q ʁ ʔ h
end
-rtl -stress / +stress / _ C C* -stress C C* +stress
The right-to-left matching makes this one particularly subtle. Brassica encounters C* first: thus this will match all the consonants, such that the preceding C is never matched. The rule can be fixed by reversing the order, C* C.
(By the way, I’m going to re-open this issue, to have a dedicated thread for such issues with the Kleene star: even though it’s already been mentioned in #5, that thread is mostly focussed on optional categories, which are a different issue.)