tf_seq2seq_chatbot
tf_seq2seq_chatbot copied to clipboard
Error when run "python train.py"
Preparing dialog data in /var/lib/tf_seq2seq_chatbot/data
Creating vocabulary /var/lib/tf_seq2seq_chatbot/data/vocab20000.in from data /var/lib/tf_seq2seq_chatbot/data/chat.in
Traceback (most recent call last):
File "train.py", line 15, in
Hi Minhbk,
I'm getting the same error, it's just as the message says. You're running into encoding problems. That character (soft hyphen apparently) isn't in the utf-8 encoding set.
Try opening with a different encoding if you can, otherwise modify the input text to use standard hyphens.
I run this code with python 2.7, and it work! I don't understand why it work :D
Solution: Remove or replace all non-ascii characters from train data. On Windows OS:
-
Install Notepad++ from here: https://notepad-plus-plus.org/download/v7.3.html
-
Open chat.in in Notepad++ and do follow:
-
Ctrl+F
-
Go to "Replace" tab
-
Change search mode to "Regular Expression"
-
Paste this regex for Non-ASCII Characters into "Find what :" field: [^\x00-\x7F]+
-
Leave field "Replace with : " empty
-
Push "Replace All"
-
Also, you can try Encoding -> Convert to ANSI and then Save
enjoy an be ready for loooooong time of training =)
Problem is that dataset has non ASCII characters (about 3k times ) such as 0xAD(some short -) 0x97 (long -) 0x00AD in Unicode: http://www.fileformat.info/info/unicode/char/ad/index.htm
Works on Win 10, GPU TensorFlow and Python 3.5.
@FrayaMiner Thanks for your response. Since you are using the same OS configuration what I am using. I have tested the windows10 + GPU + python 3.5 configuration on example tensorflow model and its working fantastic. Could you please help me with the initial steps to run this chat-bot module. I am getting the following error while executing the code.
(tensorflow-gpu) C:\Users\user1>python C:\Users\user1\Downloads\tf_seq2seq_chatbot\tf_seq2seq_chatbot\train.py
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_80.dll locally
Preparing dialog data in /var/lib/tf_seq2seq_chatbot/data
Creating vocabulary /var/lib/tf_seq2seq_chatbot/data\vocab20000.in from data /var/lib/tf_seq2seq_chatbot/data\chat.in
Traceback (most recent call last):
File "C:\Users\user1\Downloads\tf_seq2seq_chatbot\tf_seq2seq_chatbot\train.py", line 15, in
@dsblr Solution:
-
Copypaste file "tf_seq2seq_chatbot/data/train/movie_lines_selected.txt"
-
Rename it into "chat.in"
-
In ../tf_seq2seq_chatbot/configs/config.py Specify absolute path to "chat.in" using Windows style with special characters enclosing (dash '\' is special and need to be enclosed), for example: "D:\\tf_seq2seq_chatbot\\tf_seq2seq_chatbot\\data\\train\\" or "D:\\tf_seq2seq_chatbot\\tf_seq2seq_chatbot\\data\\train" (I'm not sure here)
-
Follow steps from my comment above, to clean "chat.in" from Non-ASCII characters
@FrayaMiner Thank you for your response. I am now getting the following error. UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 2329: invalid start byte I have cleanse the chat.in file as per the step mentioned by you. Please help to resolve this.
@dsblr you can try manually search and clean file, but I prefer cleaning with regex. also, you can try Encoding -> Convert to ANSI and then save
character 0xad at 2329 is a non-ascii "soft hyphen" http://www.fileformat.info/info/unicode/char/ad/index.htm
@FrayaMiner thanks once again for your response. I tried to remove the soft hyphen manually but the errpr still presist . I beleive few of the soft hype I might have missed. Could you please help me with the cleansed chat file which you are using ? It will be so helpful.
Thanks
@FrayaMiner is there a way to solve the following error. I am getting this while running the test.py "C:\Work\tf_seq2seq_chatbot\tf_seq2seq_chatbot\lib\seq2seq_model_utils.py", line 43, in get_predicted_sentence bucket_id = min([b for b in xrange(len(BUCKETS)) if BUCKETS[b][0] > len(input_token_ids)]) NameError: name 'xrange' is not defined
The above error solved after importing xrange into seq2seq_model_utils.py.
But while executing chat.py, another error I am getting :
hello Traceback (most recent call last): File "chat.py", line 15, in
tf.app.run() File "D:\Users\user1\chatbots\tensorflow\softwares\anaconda\lib\site-packages \tensorflow\python\platform\app.py", line 43, in run sys.exit(main(sys.argv[:1] + flags_passthrough)) File "chat.py", line 12, in main chat() File "C:\Work\tf_seq2seq_chatbot\tf_seq2seq_chatbot\lib\chat. py", line 27, in chat predicted_sentence = get_predicted_sentence(sentence, vocab, rev_vocab, mode l, sess) File "C:\Work\tf_seq2seq_chatbot\tf_seq2seq_chatbot\lib\seq2s eq_model_utils.py", line 65, in get_predicted_sentence output_sentence = ' '.join([rev_vocab[output] for output in outputs]) File "C:\Work\tf_seq2seq_chatbot\tf_seq2seq_chatbot\lib\seq2s eq_model_utils.py", line 65, in output_sentence = ' '.join([rev_vocab[output] for output in outputs]) IndexError: list index out of range
Guys, Any idea why I am getting the error below. I have the file ssd_mobilenet_v1_pets.config in the folder but it says something like he cannot open it without specify the reason why it cannot find the file. I am working on windows, python 3.5.3, tensorflow latest version
Traceback (most recent call last):
File "C:\Program Files (x86)\Python 3.5.2\tensorflow\my codes\objectDetectionOwnTrained_Flowers\train.py", line 202, in
are you slove the problem
Can anyone solve the below errors?
Traceback (most recent call last):
File "export_inference_graph.py", line 106, in
File "train.py", line 184, in
C:\TensorFlow\models\research\object_detection>
Hello, I want to ask if you have solved the problem
I am very worry, do you solve it/label_map??????????????/ PBTXT: ϵ ͳ \ udcd5 Ҳ \ udcbb \ udcb5 \ udcbd ָ \ udcb6 \ udca8 \ udcb5 \ udcc4 · \ udcbe \ udcb6 \ udca1 \ udca3