brassica icon indicating copy to clipboard operation
brassica copied to clipboard

Kleene star matches too greedily

Open asteriskblue opened this issue 10 months ago • 3 comments

The documentation describes the following sound change:

categories
C = p t k b d g m n ŋ f s h
Nasl = m n ŋ
VStp = b d g

V = a e i o u
end

VStp / Nasl / Nasl [C V -h]* _

; mate → mate (no change)
; matebede → matemene
; matebehede → matenehede

However, I get mate matebede matebehede (no changes) when I run it. Here's another example:

categories
C = p t k b d g m n ŋ f s h
V = a e i o u
end

[e o] / [i u] / _ [C V]* i

; meki -> miki
; meski -> miski
; meaki -> miaki

Result: meki meski meaki (no changes)

[e o] / [i u] / _ C* i works as expected, but changing C* to [C V]* breaks the rule.

asteriskblue avatar Jan 29 '25 06:01 asteriskblue

Well spotted! I actually have automated tests which make sure all the examples work as expected, but on closer inspection this one had a typo which caused it to be skipped — and it’s just my bad luck that there was a bug lurking here.

I’ll have a closer look at it when I get time.

bradrn avatar Jan 29 '25 08:01 bradrn

On further investigation, this isn’t technically a bug. In the second sound change (which is simpler, what’s happening here is that [C V]* matches all consonants and vowels including ⟨i⟩. So the sound change gets to the end of the world without actually matching i.

The second sound change can thus be fixed by rewriting it to [e o] / [i u] / _ [C V -i]* i. Similarly, the first one should be rewritten as VStp / Nasl / Nasl [C V -Vstp -h]* _. (To make it even more confusing, there was a typo in the sample output: I wrote ⟨matenehede⟩ for ⟨matemehede⟩.)

That being said, this behaviour has already been reported in #5. It’s sufficiently confusing that even I have gotten confused about it, now for the third time. So I’d better look into changing how the star works.

bradrn avatar Feb 01 '25 02:02 bradrn

@Xwtek sent me yet another excellent example of this problem:

categories
-stress = a ə ɛ e i į ɔ o u ų
+stress = á ə́ ɛ́ é í į́ ɔ́ ó ú ų́
V = &&stress
C = m p b v n t d r ts dz s z ŋ k g ɣ q ʁ ʔ h
end

-rtl -stress / +stress / _ C C* -stress C C* +stress

The right-to-left matching makes this one particularly subtle. Brassica encounters C* first: thus this will match all the consonants, such that the preceding C is never matched. The rule can be fixed by reversing the order, C* C.

(By the way, I’m going to re-open this issue, to have a dedicated thread for such issues with the Kleene star: even though it’s already been mentioned in #5, that thread is mostly focussed on optional categories, which are a different issue.)

bradrn avatar Feb 15 '25 07:02 bradrn