An-AI-Chatbot-in-Python-and-Flask icon indicating copy to clipboard operation
An-AI-Chatbot-in-Python-and-Flask copied to clipboard

NTLK changed (i think)

Open gdobrasnki opened this issue 2 years ago • 8 comments

when i run the code as is i get an error below. Im guessing the following.

I believe words changed on download from NTLK These become mismatched lengths at 126/55

    training.append([bag, output_row])
training[0]  :  [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]]
Len [0] -- 126
Len [1] -- 55
**********
Traceback (most recent call last):
  File "C:\dev\hatBot\train.py", line 87, in <module>
    training = np.array(training)
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (55, 2) + inhomogeneous part.

i can 'fix' the error by just limiting the bag but im guessing this really effects the training and i dont want to cascade down the code fixing bugs along the way, just to find out i really messed the model up and i know im way over my head.

If you download and run, i get errors.

gdobrasnki avatar Jan 10 '23 20:01 gdobrasnki

Facing the same issue.

JamesTesting888 avatar Jan 14 '23 09:01 JamesTesting888

How bad is it effecting the training?

JamesTesting888 avatar Jan 14 '23 09:01 JamesTesting888

I was able to fix it, and I think it did not really effect the training for me.

for doc in documents:
    # initializing bag of words
    bag = [0] * len(classes)
    # list of tokenized words for the pattern
    pattern_words = doc[0]
    # lemmatize each word - create base word, in attempt to represent related words
    pattern_words = [lemmatizer.lemmatize(word.lower()) for word in pattern_words]
    # create our bag of words array with 1, if word match found in current pattern
    for w in words:
        if w in pattern_words:
            index = classes.index(doc[1])
            bag[index] = 1
            

I used classes instead of words

JamesTesting888 avatar Jan 14 '23 09:01 JamesTesting888

Then changed the bow function in app.py.

def clean_up_sentence(sentence):
    sentence_words = nltk.word_tokenize(sentence)
    sentence_words = [lemmatizer.lemmatize(word.lower()) for word in sentence_words]
    return sentence_words

def bow(sentence, classes, show_details=True):
    sentence_words = clean_up_sentence(sentence)
    bag = [0] * len(classes)
    for s in sentence_words:
        for i, w in enumerate(classes):
            if w == s:
                bag[i] = 1
                if show_details:
                    print("found in bag: %s" % w)
    return np.array(bag)

For me, it seems that it does predict fine

JamesTesting888 avatar Jan 14 '23 09:01 JamesTesting888

Hi @gdobrasnki @JamesTesting888 Thanks for reporting this. I'll dig down to the root cause of it and provide an update.

@JamesTesting888 if your fix was correct, please create a pull request and I'll review and merge. Thank you

mainadennis avatar Jan 14 '23 16:01 mainadennis

Thank you. I am not, I just did a very brief fix to get rid of the error. Is the best for you to have a look because I just started learning AI.

JamesTesting888 avatar Jan 18 '23 09:01 JamesTesting888

Thanks guys, i passed on the repo but glad theres a fix!

gdobrasnki avatar Jan 18 '23 15:01 gdobrasnki

@mainadennis My fix doesnt work. Can you please look into it, thanks!

JamesTesting888 avatar Jan 19 '23 14:01 JamesTesting888