practical_seq2seq icon indicating copy to clipboard operation
practical_seq2seq copied to clipboard

Repeating few tokens in all outputs

Open dikshant2210 opened this issue 6 years ago • 4 comments

While running the notebook for twitter chatbot I got output which are repetitive and use only four or five tokens from vocab. For e.g. most of my responses were like: i i i you you i i you

dikshant2210 avatar Apr 06 '18 06:04 dikshant2210

I also have the same problem, did you fix it by any chance? I was thinking it might be because I only trained it for 12k epochs but I would still expect short answer rather than https://scontent-amt2-1.xx.fbcdn.net/v/t1.0-9/30697628_10214128158190577_121012735084331008_o.png?_nc_cat=0&oh=3796763b6a1f131daa2c9b811ded5b76&oe=5B586EDA

nicolagheza avatar Apr 11 '18 15:04 nicolagheza

Try increasing the vocabulary size. It worked for me .

Sent from my iPhone

On Apr 11, 2018, at 11:43 AM, Nicola Gheza <[email protected]mailto:[email protected]> wrote:

I also have the same problem, did you fix it by any chance? I was thinking it might be because I only trained it for 12k epochs but I would still expect short answer rather than https://scontent-amt2-1.xx.fbcdn.net/v/t1.0-9/30697628_10214128158190577_121012735084331008_o.png?_nc_cat=0&oh=3796763b6a1f131daa2c9b811ded5b76&oe=5B586EDA

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/suriyadeepan/practical_seq2seq/issues/62#issuecomment-380499737, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AKaB9Al89aEIwxSHGgeVrtOK9vs2Px6qks5tniSugaJpZM4TJlJq.

karanpande avatar Apr 11 '18 15:04 karanpande

How did you increase the vocabulary size?

xvocab_size = len(metadata['idx2w']) 
yvocab_size = xvocab_size

I am using this. And whit a simple query such as "What is your name?" I get back: "sucked800kitchenburgersburgersburgersburgersskipskipskipskipskipskipskipregisterregisterregisteryesyesyes"

nicolagheza avatar Apr 11 '18 17:04 nicolagheza

try bigger training dataset

thormacy avatar Jun 20 '18 06:06 thormacy