lute-v3 icon indicating copy to clipboard operation
lute-v3 copied to clipboard

Support non-consecutive multi-word terms

Open paradoja opened this issue 2 years ago • 6 comments

Is your feature request related to a problem? Please describe.

Germanic languages like German or Dutch have words that span several words that can be separated, in particular verbs.

For example: "Ich lade dich zu meiner Party ein." means I invite you to my party. The verb is "einladen", but in the phrase, the "lade" and the "ein" are separate (and not only this is common, this is mandatory as per the grammar). These kinds of verbs are very common.

Describe the solution you'd like

Current multi-word selection doesn't work, and shift+clicking on words is used for bulk selection. Maybe alt+clicking or some other combination could work.

Describe alternatives you've considered

I don't think there's any,, or at least I can't imagine it. The suggestion on possible solution works maybe to add terms, but no idea on how it would work to show back the information, to be honest. The main reason for the feature request is in case I'm missing something obvious that can work as a solution (short of trying to parse natural grammars).

paradoja avatar Aug 03 '23 20:08 paradoja

Hi @paradoja, thank you very much for filing the issue with the example. You're not missing anything obvious. This is a real challenge to do effectively, and will require some big hard thinking! As far as I know, no tool out there (LingQ, LWT, etc) does this correctly. Below are some disorganized thoughts for everyone's reference, including my own.

For this discussion, I'll call Non-Consecutive Multi-Word terms NCMW's (NickMows) for brevity :-) .

Challenges

Parsing challenges

Currently, Lute does term matching just by comparing text, ignoring case, with no understanding of the sentence. With a sentence like "Ich trinke Bier", it's easy to search for "trinke". With something like "Ich lade dich zu meiner Party ein," Lute would need to be clever enough to associate the correct "ein" with the "lade", and ignore the rest. And if a sentence contained multiple NCMW's, like "Ich lade dich zu meiner Party ein, und du lädst mich zu deiner Party ein," it would need to match the first "ein" with "lade", and the second with "lädst" -- it seems that it would need to understand that the sentence is two separate clauses joined by "und".

I don't know if there are cases where a single sentence (of one clause) can contain multiple NCMWs, with a bunch of separable parts at the end of the sentence -- I don't believe so.

Correcting bad parses

Let's suppose that Lute did support NCMWs. Also suppose that for a given sentence, Lute parses it incorrectly; we'd need a way for the user to correct the bad parsing, and to have that correction persist. e.g. given bad German like "Ich lade dich zu ein Party ein," (I know that's incorrectl German, just illustrating the case), Lute might incorrectly link the term "Ich LADE dich zu EIN Party ein" -- and the user would somehow need to tell Lute that it should be "Ich LADE dich zu ein Party EIN."

Data storage

Do all of the permutations of a term-and-participle need to be created and stored?

  • Maybe "[ab]geben" would automatically have "gibst...ab" be created when "geben"=>"gibst" is created. I'm not sure if that's possible, a bad idea, etc.
  • Maybe "gibst...ab" wouldn't even need to be created, maybe it's implicitly present when "geben"=>"gibst" is created, if "[ab]geben" is present.

Some blue-sky thoughts about user experience

Note: I have no idea if any of the below is possible with Lute as it is, I'm just thinking about how things might work for me when reading German. I'm not studying German yet, and so haven't really thought about this issue in concrete terms, either from usability or from implementation. Real coding progress will only happen when I or another developer has some real skin in the game :-) because it's a tough problem!

Listing possible parents in dropdown

For example, I'm reading the text "Gibst du mir ein Stück von deinem Kuchen ab?" (Are you going to give me a piece of your cake?) Perhaps Lute sees "gibst", which has parent term "geben", and parent term "geben" somehow indicates that it can have a particle, "geben/abgeben/ausgeben/etc. So, I hover over "gibst" and it shows "geben", "[ab]laden", "[aus]geben" etc. All of these "[ab]geben", "[aus]geben" would need to be defined -- I'm not sure yet how Lute would indicate the separable particle "ab/aus" etc. Then, maybe there is a way to flag this instance of "gibst" in this particular sentence as "gibst...ab (abgeben)", so that in the future when I'm looking for examples for "abgeben", it returns this sentence.

Selecting non-contiguous parts

Reading "Gibst du mir ein Stück von deinem Kuchen ab?", I somehow select the parts "Gibst" and "ab", and that creates a term "gibst...ab" or something like it. Then I say the parent term is "[ab]geben" or "abgeben", etc etc. Then with new sentence "Gibst du mir ein Stück von deinem XYZ ab?" -- how to handle? If I had terms "gibst", "gibst...ab" and "gibst..aus" defined, I don't know if I'd want Lute to try to parse all possible matches, due to parsing difficulty.

What matters when reading and looking up?

If I'm reading, I want Lute to help see/think of past NCMWs I've read. If I see "Gibst du mir ein Stück ab?" for the first time, maybe it's enough to have Lute somehow remind me of the form "[ab]geben", and not necessarily highlight the "gibst...ab" term in the sentence.

Perhaps the parent term "geben" could just list the different forms it could have, in the translation or in some other note: "[ab|aus|...]geben", so that I'm reminded to look for those.

Maybe if I hover over the "ab", perhaps Lute shows things earlier in the sentence that it could be associated with, e.g. "ab" shows popup "gibst...ab". This implies that Lute has all of the forms for the terms in the present sentence at parsing/rendering time. I'm not sure what the ramifications would be.

For me, the other thing that matters when reading is the ability to look up prior sentences. Currently, Lute sentence matches are done with straight text matching only. With particles like "[ab]geben", maybe this would need to be expanded somehow -- maybe Lute would give some false positives (bad examples) but that could be acceptable; e.g., the (nonsense) sentence "du GIBST mir aus, und ehr sieht AB" would return as a match for "gibst...ab", even though that's not really present in the sentence.

Alleviating term mapping hassles

Creating and mapping all of the stuff (e.g. "geben" has "[aus]geben"/"[ab]geben" siblings, "gibst" child, "[aus]geben" has "gibst...ab" child etc) is a drag, but could be alleviated by having data files available for people to load (when they're ready!) or having users share their data files. I personally think that taking the time to create all of these mappings is useful time.

jzohrab avatar Aug 04 '23 20:08 jzohrab

Other examples from https://emmalovesgerman.com/german-separable-verbs/, possible test cases:

  • Ich stehe um 7 Uhr auf.
  • Was hast du heute vor?
  • Ich kann den Kuchen mitbringen.
  • Er will das Geld für ein Auto ausgeben.
  • Er gibt das Geld für ein Auto aus.
  • Wach auf!
  • Bitte räumen Sie die Küche auf
  • Ich werde das nächste Mal richtig zuhören.
  • Ich habe das Rauchen aufgegeben.
  • Ich schlafe um 22 Uhr ein, weil ich um 6 Uhr morgens aufwache.

jzohrab avatar Sep 13 '23 23:09 jzohrab

In Mandarin Chinese, there are also separable words. The same feature can also be used to add phrases. My thoughts on German separable verbs is, detect specific strings after a verb in the sentence: comma, period, semicolon, colon, exclamation mark, question mark, “und”, “aber”; If the prefix is just before such a string, then it is detected as a separable verb.

Maybe the user can add such words and phrases manually.

GrimPixel avatar Feb 11 '24 00:02 GrimPixel

This could be one solution:

  1. I can click on any word and then press control and click another word
  2. This will create a new term like any other word or multi-word phrase ### + ###
  3. They are not permanently displayed and only come up when you click the two words again (step 1)
  4. Hovering over either word when both are selected would give the tooltip for ### + ###

Maybe up it to include 3 or more? IDK

M-Biggles avatar Feb 11 '24 15:02 M-Biggles

Note from MyCheze in Discord:


Idea for this, cause Czech does it too... Sorta. But instead of the infinitive being one word, it's two. Much like English phrasal verbs. German is closer to Spanish reflexive verbs.

But, couldn't there be some kind of "toggle" for "seperable" words that are assumed to have a fuzzy order? When I add a "two piece" word to Lute, it only looks for those words in that exact order. But, if I hit the toggle, then it could look to see if those words occur in the same sentence together, regardless of the order or number of words between them.

Then, if the parent only exists as a 2 piece word, it can display that as the definition. But if there is also a parent that isn't seperable, it just displays them both as parents.

Example: učit = to teach učit se = to learn (toggled to be a "fuzzy" term)

Sentence one to moje mluvení a že se snažíte učit česky

Učit in this sentence would have both učit and učit se as parents. Se would have it's definition (oneself) and učit se

Sentence two Můžete mě učit, přesně jako Peter.

In this sentence, učit would only have the parent učit

Irrefutable and ironclad logic: "This word has a parent has a fuzzy search connection! Lemme check the sentence for the other part(s). If I find it, use that as a parent. If not, ignore it!"


jzohrab avatar May 07 '24 17:05 jzohrab

A way of dealing with non-consecutive multi-word terms:

In the English language, taking phrasal verbs as an example, as in "put the gun down", a way to select "put" and then use the Alt key or another to select "down" would be nice. This way Lute understands that "put...down" means the same as "put down". In another sentence, if you realize that the words "put" and "down" are unrelated, you can select the word "down" and tell Lute that.

Examples: Term: take away from / variation: take...away from. Term: make up / variation: make...up.

RogerHetfield avatar Jun 27 '24 11:06 RogerHetfield