NeuralDialog-CVAE-pytorch copied to clipboard
Fluctuating BLEU values
Ran the code, but it gives BLEU scores fluctuating between epochs, e.g. fluctuating between 0.12 and 0.51 below in the last two epochs and also previous epochs. Any idea why?
Avg recall BLEU 0.127371, avg precision BLEU 0.050736 and F1 0.072566 (only 1 reference response. Not final result)
Avg recall BLEU 0.127371, avg precision BLEU 0.050736 and F1 0.072566 (only 1 reference response. Not final result)
Done testing
>> Epoch 59 with lr 0.001000
0.97 elbo_loss 42.031412 bow_loss 54.999851 rc_loss 26.617763 rc_peplexity 6.985817 kl_loss 15.413649 kl_w 1.000000
Epoch Done elbo_loss 42.444348 bow_loss 55.620425 rc_loss 26.963350 rc_peplexity 7.035742 kl_loss 15.480998 step time 0.0071
Valid begins with 41 batches with 0 left over samples
ELBO_VALID elbo_loss 59.449103 bow_loss 62.838097 rc_loss 43.372815 rc_peplexity 22.055600 kl_loss 16.076289
Test begins with 5481 batches with 0 left over samples
Batch 1 index 0 of topic "HOBBIES AND CRAFTS"
Src 0-1: <s> <d> </s>
Target (wh-question) >> okay what kind of things do you like to do in your spare time
Sample 0 (other) >> okay
Sample 1 (wh-question) >> so what would you wanna do
Sample 2 (other) >> all right
Sample 3 (other) >> okay
Sample 4 (statement-non-opinion) >> okay my sister - off says a little bit
Batch 2 index 0 of topic "HOBBIES AND CRAFTS"
Src 0-0: <s> <d> </s>
Src 1-0: <s> okay what kind of things do you like to do in your spare time </s>
Target (statement-non-opinion) >> well i have two children so i don't have a whole lot of spare time right now one of the things that i've made time for is playing softball
Sample 0 (statement-non-opinion) >> well i've done that for a while and i've gone to a lot of hobbies that i've ever done
Sample 1 (statement-non-opinion) >> oh i just love a new job and i just really enjoy it
Sample 2 (statement-non-opinion) >> well i have a very large writers
Sample 3 (statement-non-opinion) >> woodworking for our hobby
Sample 4 (statement-non-opinion) >> well it's a different crafts to take aerobics now and set aside for a conference in like right now they're almost <unk> to the pound of the home
Batch 3 index 0 of topic "HOBBIES AND CRAFTS"
Src 0-1: <s> <d> </s>
Src 1-1: <s> okay what kind of things do you like to do in your spare time </s>
Src 2-0: <s> well i have two children so i don ' t have a whole lot of spare time right now one of the things that i ' ve made time for is playing softball </s>
Target (abandoned_or_turn-exit/uninterpretable) >> um - hum
Sample 0 (acknowledge_(backchannel)) >> uh - huh
Sample 1 (acknowledge_(backchannel)) >> sure
Sample 2 (abandoned_or_turn-exit/uninterpretable) >> um - hum
Sample 3 (backchannel_in_question_form) >> really
Sample 4 (acknowledge_(backchannel)) >> uh - huh
Batch 4 index 0 of topic "HOBBIES AND CRAFTS"
Src 0-1: <s> <d> </s>
Src 1-1: <s> okay what kind of things do you like to do in your spare time </s>
Src 2-0: <s> well i have two children so i don ' t have a whole lot of spare time right now one of the things that i ' ve made time for is playing softball </s>
Src 3-1: <s> um - hum </s>
Target (acknowledge_(backchannel)) >> uh - huh
Sample 0 (acknowledge_(backchannel)) >> right
Sample 1 (acknowledge_(backchannel)) >> uh - huh
Sample 2 (yes-no-question) >> wow that's nine years old you have to have that
Sample 3 (acknowledge_(backchannel)) >> yeah
Sample 4 (acknowledge_(backchannel)) >> uh - huh
Batch 5 index 0 of topic "HOBBIES AND CRAFTS"
Src 0-0: <s> <d> </s>
Src 1-0: <s> okay what kind of things do you like to do in your spare time </s>
Src 2-1: <s> well i have two children so i don ' t have a whole lot of spare time right now one of the things that i ' ve made time for is playing softball </s>
Src 3-0: <s> um - hum </s>
Src 4-0: <s> uh - huh </s>
Target (statement-non-opinion) >> i really enjoy i enjoy softball but i enjoy all kind of sports i enjoy watching and participating i really like water sports like swimming and skiing but i don't get to do that too often
Sample 0 (statement-non-opinion) >> i don't know i am i'm a i'm a i'm a i'm a professor i enjoy woodworking i don't really have a lot of woodworking i know i
Sample 1 (statement-non-opinion) >> so i enjoy
Sample 2 (statement-non-opinion) >> and i have an awful lot of time for it i've been doing it lately
Sample 3 (statement-non-opinion) >> i tended to stay for my own
Sample 4 (statement-non-opinion) >> so i enjoy it and i just have to do all my time and my husband takes piano lessons and painted the equipment but it's sort of an interesting thing
Batch 6 index 0 of topic "HOBBIES AND CRAFTS"
Src 1-1: <s> okay what kind of things do you like to do in your spare time </s>
Src 2-0: <s> well i have two children so i don ' t have a whole lot of spare time right now one of the things that i ' ve made time for is playing softball </s>
Src 3-1: <s> um - hum </s>
Src 4-1: <s> uh - huh </s>
Src 5-0: <s> i really enjoy i enjoy softball but i enjoy all kind of sports i enjoy watching and participating i really like water sports like swimming and skiing but i don ' t get to do that too often </s>
Target (acknowledge_(backchannel)) >> yeah
Sample 0 (acknowledge_(backchannel)) >> oh yeah it
Sample 1 (yes-no-question) >> right well you're a great golfer that you like
Sample 2 (acknowledge_(backchannel)) >> um - hum right
Sample 3 (statement-non-opinion) >> um - hum and so forth i do that kind of thing
Sample 4 (acknowledge_(backchannel)) >> uh - huh
Avg recall BLEU 0.518341, avg precision BLEU 0.184683 and F1 0.272334 (only 1 reference response. Not final result)
Avg recall BLEU 0.518341, avg precision BLEU 0.184683 and F1 0.272334 (only 1 reference response. Not final result)
Done testing
Best validation loss 54.038599
Done training
Also the results from two separate runs could be quite different, e.g., when I ran above experiment again I ended up getting Avg recall BLEU of 0.28 and 0.22 in the last two epochs. Have copied the logs below:
Avg recall BLEU 0.287736, avg precision BLEU 0.094147 and F1 0.141874 (only 1 reference response. Not final result)
Avg recall BLEU 0.287736, avg precision BLEU 0.094147 and F1 0.141874 (only 1 reference response. Not final result)
Done testing
>> Epoch 59 with lr 0.001000
0.97 elbo_loss 40.211126 bow_loss 52.399249 rc_loss 25.270406 rc_peplexity 6.776904 kl_loss 14.940720 kl_w 1.000000
Epoch Done elbo_loss 40.569005 bow_loss 52.882008 rc_loss 25.541335 rc_peplexity 6.810425 kl_loss 15.027671 step time 0.0071
Valid begins with 41 batches with 0 left over samples
ELBO_VALID elbo_loss 59.682895 bow_loss 63.245441 rc_loss 43.793205 rc_peplexity 22.741406 kl_loss 15.889690
Test begins with 5481 batches with 0 left over samples
Batch 1 index 0 of topic "POLITICS"
Src 0-0: <s> <d> </s>
Target (agree/accept) >> yes
Sample 0 (other) >> being a new car
Sample 1 (wh-question) >> so what kind of things are we
Sample 2 (statement-non-opinion) >> well it's just putting off a broken by hand but it was it was a it was a it was a decision to take the time to find a place stack of it i don't remember
Sample 3 (wh-question) >> okay i don't know what you can do and
Sample 4 (statement-non-opinion) >> well i have been tempted and i have the ads and i get the most of the stuff now as far as the regular goes
Batch 2 index 0 of topic "POLITICS"
Src 0-0: <s> <d> </s>
Src 1-1: <s> yes </s>
Target (agree/accept) >> yes i am
Sample 0 (statement-non-opinion) >> well they were all they were still in public schools
Sample 1 (statement-opinion) >> you know it's just that you don't want to buy
Sample 2 (statement-non-opinion) >> i guess i've got to have a
Sample 3 (agree/accept) >> oh
Sample 4 (statement-non-opinion) >> the way the
Batch 3 index 0 of topic "POLITICS"
Src 0-1: <s> <d> </s>
Src 1-0: <s> yes </s>
Src 2-0: <s> yes i am </s>
Target (statement-non-opinion) >> i've been trying to get people at five thirty and six thirty in the evening and i thought well i'm not having any luck i'll try the middle of the afternoon
Sample 0 (abandoned_or_turn-exit/uninterpretable) >> so we're not ready to
Sample 1 (agree/accept) >> okay
Sample 2 (statement-non-opinion) >> okay well it's going to be the topic of vietnam
Sample 3 (yes-no-question) >> oh i don't know i have been called <unk> or couple of stories about that
Sample 4 (statement-non-opinion) >> i don't know i had to register us and i had that register waiting on it we didn't have to worry about it because i was just thinking that we were
Batch 4 index 0 of topic "POLITICS"
Src 0-0: <s> <d> </s>
Src 1-1: <s> yes </s>
Src 2-1: <s> yes i am </s>
Src 3-0: <s> i ' ve been trying to get people at five thirty and six thirty in the evening and i thought well i ' m not having any luck i ' ll try the middle of the afternoon </s>
Target (statement-non-opinion) >> well i'm happy we got through
Sample 0 (abandoned_or_turn-exit/uninterpretable) >> well
Sample 1 (acknowledge_(backchannel)) >> yeah
Sample 2 (acknowledge_(backchannel)) >> yeah
Sample 3 (backchannel_in_question_form) >> oh great
Sample 4 (acknowledge_(backchannel)) >> uh - huh
Batch 5 index 0 of topic "POLITICS"
Src 0-1: <s> <d> </s>
Src 1-0: <s> yes </s>
Src 2-0: <s> yes i am </s>
Src 3-1: <s> i ' ve been trying to get people at five thirty and six thirty in the evening and i thought well i ' m not having any luck i ' ll try the middle of the afternoon </s>
Src 4-0: <s> well i ' m happy we got through </s>
Target (yes-no-question) >> i am too well do you have any opinion on the subject
Sample 0 (wh-question) >> well how do you think it stopped out on
Sample 1 (statement-non-opinion) >> i suspect the apathy has gone to the american cars
Sample 2 (acknowledge_(backchannel)) >> i see
Sample 3 (acknowledge_(backchannel)) >> yeah
Sample 4 (statement-non-opinion) >> i don't know it's a real good thing for me to say you know if they had a car for a while
Batch 6 index 0 of topic "POLITICS"
Src 1-1: <s> yes </s>
Src 2-1: <s> yes i am </s>
Src 3-0: <s> i ' ve been trying to get people at five thirty and six thirty in the evening and i thought well i ' m not having any luck i ' ll try the middle of the afternoon </s>
Src 4-1: <s> well i ' m happy we got through </s>
Src 5-0: <s> i am too well do you have any opinion on the subject </s>
Target (open-question) >> well i'm especially interested with what what's happening in the soviet union the move to
Sample 0 (statement-non-opinion) >> oh i'm sorry and he's going to take the <unk> and
Sample 1 (statement-non-opinion) >> i'll bet it is something that
Sample 2 (statement-non-opinion) >> we've been here and we have the same <unk> where we wanted to go and i was just amazed too where they're on the road
Sample 3 (statement-non-opinion) >> i don't know i'm not so many people are
Sample 4 (no_answers) >> no
Avg recall BLEU 0.226801, avg precision BLEU 0.086140 and F1 0.124858 (only 1 reference response. Not final result)
Avg recall BLEU 0.226801, avg precision BLEU 0.086140 and F1 0.124858 (only 1 reference response. Not final result)
Done testing
Best validation loss 53.880735
Done training
Didn't look into the detail of bleu unfortunately. Did you run the tensorflow version and check the difference?