t9-emulator icon indicating copy to clipboard operation
t9-emulator copied to clipboard

can't find 'hello'

Open tadjik1 opened this issue 10 years ago • 9 comments

Hi!

It's a great project! But I found a strange bug... I can't find word 'hello'. Just 'gekko'.

Maybe you know what's wrong?

tadjik1 avatar Jun 30 '14 17:06 tadjik1

You're right. Perhaps there is a bug in the tree traversing code as 'hello' is already in the dictionary. I'll take a look at this. Thanks for noticing this :)

arifwn avatar Jun 30 '14 18:06 arifwn

Ok, I try to help you, algorithm seems unoptimized for search.

And add plz "w" letter ;)

tadjik1 avatar Jun 30 '14 19:06 tadjik1

so... I think that problem in cycle for leafs in nested node. If you add this code:

if (currentWord.length === sequence.length && words.indexOf(currentWord) === -1) {
  words.push(currentWord);
} 

after cycle all works fine.

tadjik1 avatar Jun 30 '14 23:06 tadjik1

Thanks, but that would add partial words to the list such as helln (a partial of hellnes) when you attempt to type hello.

I pushed a fix and added 'w' to the buttons.

arifwn avatar Jul 01 '14 09:07 arifwn

Yes, and this is right. When you start typing, you can find a part of the word. When you type “43”, you see “he”, “ge”, which are parts of the words “hello”, “gekko”, and many others.

Moreover, this algorithm is unoptimized: you check every leaf in tree instead of checking the necessary one.

For example, when I type “1”, you know that I want to check words starting with “a”, “b” or “c”. But you go through every leaf in the root node.

tadjik1 avatar Jul 01 '14 11:07 tadjik1

About ignoring partial words, that's the decision I took when start tinkering with this small side projects. You're right about the tree parsing not optimised. I used the code from https://github.com/jrolfs/javascript-trie-predict and didn't really look into it until you pointed out the hello bug.

arifwn avatar Jul 01 '14 11:07 arifwn

ok, I agree with you. anyway your algorithm is O(n * m) where 'm' is alphabet length (26 in your case) and n is sequence length. My is O(n * 3).

Thanks for your attention =)

tadjik1 avatar Jul 01 '14 11:07 tadjik1

No problem. Thanks for pointing out that the prediction algorithm is actually suck :) I probably should rewrite it when things settled down. The thing that I really want to do is sorting out the predicted words based on frequency (http://www.americannationalcorpus.org/SecondRelease/frequency2.html) and adding some sort of autocomplete (so displaying partial words as you suggested would makes sense as they can be autocompleted).

arifwn avatar Jul 01 '14 12:07 arifwn

it would be interesting) I make the same thing, but based on nodejs and websockets. I try to use frequency and change it based on users select. With add word function.

And use 2 dictionaries: ru and eng. Interesting task =)

tadjik1 avatar Jul 01 '14 16:07 tadjik1