DialoGPT icon indicating copy to clipboard operation
DialoGPT copied to clipboard

Confused about the Model outputs?

Open ArEnSc opened this issue 4 years ago • 29 comments

Hey I got the hugging face GPT2 Large model of Dialog, I guess pre trained? and I have tried to ask it questions and it seems like it's not really returning any thing interesting. I asked it what is the meaning of life? and it said to be a good boy? I am confused about why it didn't pick up anything from reddit am I missing something? am I supposed to train it my self with the reddit dataset to get similar outputs from what was described?

ArEnSc avatar Jun 07 '20 03:06 ArEnSc

Okay so I think in Microsofts hugging face release with your decoders, you nurfed the outputs to prevent anything that is said that would be controversial

ArEnSc avatar Jun 07 '20 19:06 ArEnSc

@ArEnSc , were you able to get the expected results? I also tried the same model and it's not giving quality replies. Not sure what I am missing here

chiranshu14 avatar Feb 20 '21 02:02 chiranshu14

@chiranshu14 you have to use a custom decoder, I think they filter out everything or nurfed the model

ArEnSc avatar Feb 20 '21 15:02 ArEnSc

@ArEnSc Thanks for your response. It's surprising to me because they mentioned that it is ready to use. I'll look up more on using a custom decoder.

chiranshu14 avatar Feb 20 '21 16:02 chiranshu14

@chiranshu14 hey if I point you in the right direction can you help me out with the results later? I think I know how to get the results, I just need someone to do it, since I am super busy working on other things

ArEnSc avatar Feb 20 '21 17:02 ArEnSc

@ArEnSc sure Michael, could you please tell me more on what exactly needs to be done here? I'm kinda new to this, by using a custom decoder do you mean fine tuning with a different dataset? Thanks, Chiranshu

chiranshu14 avatar Feb 21 '21 00:02 chiranshu14

@ArEnSc sure Michael, could you please tell me more on what exactly needs to be done here? I'm kinda new to this, by using a custom decoder do you mean fine tuning with a different dataset? Thanks, Chiranshu

You just need to load the model, hopefully they didn't nurf the model, but here is the script, explore it a bit, it works similar to the original paper, but let me know if the results are better

https://colab.research.google.com/drive/1PslHE4Rl4RqSa20s7HEp0ZKITBir6ezE

ArEnSc avatar Feb 21 '21 04:02 ArEnSc

@ArEnSc yes, I have been trying this and other sample decoding scripts on the readme page. None of them have worked so far, they all have issues while loading the weights. Some weights seem to be missing.

chiranshu14 avatar Feb 22 '21 05:02 chiranshu14

@ArEnSc @chiranshu14 maybe you can try this script with python src/generation.py play -pg=restore/medium_ft.pkl --sampling It's from DialogRPT, a dialog response ranking model from our group as a follow-up work of DialoGPT.

golsun avatar Feb 22 '21 06:02 golsun

@golsun @ArEnSc

Thanks for your response Xiang Gao, I'll give that a try today. Will keep you both updated about how it goes.

chiranshu14 avatar Feb 22 '21 06:02 chiranshu14

@golsun is this the same technique that was used to train Tay on twitter? it seems like it Thanks. @chiranshu14 what @golsun suggested doesn't work, I have an earlier model of this laying around some where I just have to find it

ArEnSc avatar Feb 22 '21 15:02 ArEnSc

No it’s not related to Tay.

On Mon, Feb 22, 2021 at 7:43 AM Michael Chung [email protected] wrote:

@golsun https://github.com/golsun is this the same technique that was used to train Tay on twitter? it seems like it Thanks. @chiranshu14 https://github.com/chiranshu14 what @golsun https://github.com/golsun suggested doesn't work, I have an earlier model of this laying around some where I just have to find it

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/DialoGPT/issues/43#issuecomment-783465905, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCQUMSUB6PF3GQJPUME2PLTAJ3RVANCNFSM4NWV22LA .

-- Thanks!

Xiang

golsun avatar Feb 22 '21 16:02 golsun

No it’s not related to Tay. On Mon, Feb 22, 2021 at 7:43 AM Michael Chung @.***> wrote: @golsun https://github.com/golsun is this the same technique that was used to train Tay on twitter? it seems like it Thanks. @chiranshu14 https://github.com/chiranshu14 what @golsun https://github.com/golsun suggested doesn't work, I have an earlier model of this laying around some where I just have to find it — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#43 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCQUMSUB6PF3GQJPUME2PLTAJ3RVANCNFSM4NWV22LA . -- Thanks! Xiang

Oh thanks yeah I watched the video, essentially, it helps bring to better responses in data a bit by predicting which are more preferred for a response, smart!.

Is there anyway to get Tay like learning behaviour from any project Microsoft has open sourced? it seems like it's doing online learning based off ranking of specific things said to it, based off feedback from the community?

ArEnSc avatar Feb 22 '21 18:02 ArEnSc

@golsun Does a higher rank mean better response?

For these simple examples with your demo notebook, the results are not as expected.

Example 1- Context: Hi 0.301 gen 0.367 ranker 0.301 Hi! How are you?!?!?!?!?!?!?!?!?!?!?!? 0.300 gen 0.370 ranker 0.300 Hi! How are you?!?!?!?!?!?!?!?!?!?!?!?? 0.299 gen 0.365 ranker 0.299 Hi! How are you?!?!?!?!?!?!?!?!?!?!??!? 0.298 gen 0.363 ranker 0.298 Hi! How are you?!?!?!?!?!?!?!?!?!?!??? 0.298 gen 0.361 ranker 0.298 Hi! How are you?!?!?!?!?!?!?!?!?!?!??! 0.298 gen 0.357 ranker 0.298 Hi! How are you?!?!?!?!?!?!?!?!?!?!? 0.297 gen 0.373 ranker 0.297 Hi! How are you?!?!?!?!?!?!?!?!?!?!?!?! 0.297 gen 0.360 ranker 0.297 Hi! How are you?!?!?!?!?!?!?!?!?!?!?? 0.295 gen 0.354 ranker 0.295 Hi! How are you?!?!?!?!?!?!?!?!?!??!? 0.295 gen 0.352 ranker 0.295 Hi! How are you?!?!?!?!?!?!?!?!?!??? 0.295 gen 0.345 ranker 0.295 Hi! How are you?!?!?!?!?!?!?!?!?!? 0.294 gen 0.350 ranker 0.294 Hi! How are you?!?!?!?!?!?!?!?!?!??! 0.294 gen 0.363 ranker 0.294 Hi! How are you?!?!?!?!?!?!?!?!?!?!?! 0.293 gen 0.348 ranker 0.293 Hi! How are you?!?!?!?!?!?!?!?!?!?? 0.292 gen 0.339 ranker 0.292 Hi! How are you?!?!?!?!?!?!?!?!??? 0.291 gen 0.332 ranker 0.291 Hi! How are you?!?!?!?!?!?!?!?!? 0.291 gen 0.343 ranker 0.291 Hi! How are you?!?!?!?!?!?!?!?!??!? 0.291 gen 0.339 ranker 0.291 Hi! How are you?!?!?!?!?!?!?!?!??! 0.290 gen 0.353 ranker 0.290 Hi! How are you?!?!?!?!?!?!?!?!?!?! 0.289 gen 0.335 ranker 0.289 Hi! How are you?!?!?!?!?!?!?!?!?? 0.288 gen 0.326 ranker 0.288 Hi! How are you?!?!?!?!?!?!?!??? 0.288 gen 0.330 ranker 0.288 Hi! How are you?!?!?!?!?!?!?!??!? 0.288 gen 0.318 ranker 0.288 Hi! How are you?!?!?!?!?!?!?!? 0.287 gen 0.317 ranker 0.287 Hi! How are you!?!?!?!?!?!?!?!? 0.287 gen 0.326 ranker 0.287 Hi! How are you?!?!?!?!?!?!?!??! 0.286 gen 0.341 ranker 0.286 Hi! How are you?!?!?!?!?!?!?!?!?! 0.286 gen 0.311 ranker 0.286 Hi! How are you?!?!?!?!?!?!??? 0.285 gen 0.322 ranker 0.285 Hi! How are you?!?!?!?!?!?!?!?? 0.284 gen 0.303 ranker 0.284 Hi! How are you?!?!?!?!?!?!? 0.283 gen 0.267 ranker 0.283 Hi! How are you?!?!?!?!? 0.283 gen 0.302 ranker 0.283 Hi! How are you!?!?!?!?!?!?!? 0.283 gen 0.285 ranker 0.283 Hi! How are you?!?!?!?!?!? 0.282 gen 0.328 ranker 0.282 Hi! How are you?!?!?!?!?!?!?!?! 0.282 gen 0.307 ranker 0.282 Hi! How are you?!?!?!?!?!?!?? 0.282 gen 0.294 ranker 0.282 Hi! How are you?!?!?!?!?!??? 0.280 gen 0.248 ranker 0.280 Hi! How are you?!?!?!? 0.279 gen 0.313 ranker 0.279 Hi! How are you?!?!?!?!?!?!?! 0.279 gen 0.286 ranker 0.279 Hi! How are you!?!?!?!?!?!? 0.277 gen 0.268 ranker 0.277 Hi! How are you!?!?!?!?!? 0.277 gen 0.235 ranker 0.277 Hi! How are you?!?!? 0.277 gen 0.256 ranker 0.277 Hi! How are you?!?! 0.277 gen 0.251 ranker 0.277 Hi! How are you!?!?!?!? 0.277 gen 0.275 ranker 0.277 Hi! How are you?!?!?!?!??! 0.276 gen 0.289 ranker 0.276 Hi! How are you?!?!?!?!?!?? 0.276 gen 0.311 ranker 0.276 Hi! How are you!?!?!?!?!?!?!?! 0.276 gen 0.241 ranker 0.276 Hi! How are you!?!? 0.276 gen 0.296 ranker 0.276 Hi! How are you?!?!?!?!?!?! 0.276 gen 0.258 ranker 0.276 Hi! How are you?!? 0.275 gen 0.247 ranker 0.275 Hi! How are you?!?!?! 0.275 gen 0.261 ranker 0.275 Hi! How are you?!?!?!?! 0.274 gen 0.238 ranker 0.274 Hi! How are you!?!?!? 0.274 gen 0.255 ranker 0.274 Hi! How are you?!?!?!??! 0.274 gen 0.279 ranker 0.274 Hi! How are you?!?!?!?!?! 0.273 gen 0.296 ranker 0.273 Hi! How are you!?!?!?!?!?!?! 0.272 gen 0.270 ranker 0.272 Hi! How are you?!?!?!?!?? 0.271 gen 0.290 ranker 0.271 Hi! How are you!? 0.270 gen 0.229 ranker 0.270 Hi! How are you!?! 0.270 gen 0.234 ranker 0.270 Hi! How are you!?!?! 0.269 gen 0.278 ranker 0.269 Hi! How are you!?!?!?!?!?! 0.268 gen 0.247 ranker 0.268 Hi! How are you!?!?!?! 0.268 gen 0.250 ranker 0.268 Hi! How are you?!?!?!?? 0.267 gen 0.263 ranker 0.267 Hi! How are you!?!?!?!?! 0.266 gen 0.301 ranker 0.266 Hi! How are you?! 0.262 gen 0.412 ranker 0.262 Hi! How are you? 0.262 gen 0.219 ranker 0.262 Hi! How are you?!?!?? 0.262 gen 0.235 ranker 0.262 Hi! How are you!?!?!?? 0.251 gen 0.266 ranker 0.251 Hi! How are you! 0.243 gen 0.220 ranker 0.243 Hi! How are you today?!?! 0.238 gen 0.246 ranker 0.238 Hi! How are you today?! 0.235 gen 0.221 ranker 0.235 Hi! How are you today?!? 0.235 gen 0.343 ranker 0.235 Hi! How are you today? 0.223 gen 0.131 ranker 0.223 Hi! How are ya 0.220 gen 0.186 ranker 0.220 Hi! How are you today 0.214 gen 0.153 ranker 0.214 Hello! 0.196 gen 0.166 ranker 0.196 Hi! 0.190 gen 0.307 ranker 0.190 Hello! :D 0.186 gen 0.338 ranker 0.186 Hi! :D

Example 2- Context: Does money buy happiness? 0.566 gen 0.415 ranker 0.566 Money buys happiness, yes. Money buys happiness, no. Money buys happiness, yes. Money buys happiness, no. Money buys happiness, no 0.561 gen 0.409 ranker 0.561 Money buys happiness, yes. Money buys happiness, no. Money buys happiness, yes. Money buys happiness, yes. Money buys happiness, yes 0.559 gen 0.449 ranker 0.559 Money buys happiness, yes. Money buys happiness, no. Money buys happiness, yes. Money buys happiness, no. 0.557 gen 0.379 ranker 0.557 Money buys happiness, yes. Money buys happiness, no. Money buys happiness, yes. Money buys happiness, no 0.556 gen 0.430 ranker 0.556 Money buys happiness, yes. Money buys happiness, no. Money buys happiness, yes. Money buys happiness, no. Money buys happiness, yes 0.552 gen 0.358 ranker 0.552 Money buys happiness, yes. Money buys happiness, no. Money buys happiness, yes. Money buys happiness, yes 0.545 gen 0.421 ranker 0.545 Money buys happiness, yes. Money buys happiness, no. Money buys happiness, yes. Money buys happiness, yes. 0.545 gen 0.377 ranker 0.545 Money buys happiness, yes. Money buys happiness, no. Money buys happiness, yes. Money buys happiness. 0.536 gen 0.411 ranker 0.536 Money buys happiness, yes. Money buys happiness, no. Money buys happiness, no. Money buys happiness, yes. 0.532 gen 0.382 ranker 0.532 Money buys happiness, yes. Money buys happiness, no. Money buys happiness, yes. 0.531 gen 0.352 ranker 0.531 Money buys happiness, yes. Money buys happiness, no. Money buys happiness, no. Money buys happiness, yes 0.523 gen 0.349 ranker 0.523 Money buys happiness, yes. Money buys happiness, no. Money buys happiness, no. 0.518 gen 0.320 ranker 0.518 Money buys happiness, yes. Money buys happiness, no. Money buys happiness. 0.516 gen 0.309 ranker 0.516 Money buys happiness, yes. Money buys happiness, no. Money buys happiness, yes 0.511 gen 0.271 ranker 0.511 Money buys happiness, yes. Money buys happiness, no. Money buys happiness, no 0.479 gen 0.355 ranker 0.479 Money buys happiness, yes. Money buys happiness, no. 0.466 gen 0.308 ranker 0.466 Money can buy happiness, but it can't buy happiness. 0.452 gen 0.242 ranker 0.452 Money buys happiness, yes. Money buys happiness, no 0.411 gen 0.321 ranker 0.411 Money can buy happiness, but it can also buy a lot of things. 0.392 gen 0.329 ranker 0.392 Money can buy happiness, but it can also buy a lot of other things. 0.333 gen 0.306 ranker 0.333 Money buys happiness, yes. 0.331 gen 0.173 ranker 0.331 Money buys everything 0.327 gen 0.188 ranker 0.327 Money buys happiness, yes 0.306 gen 0.291 ranker 0.306 Money buys happiness.

Example 3 - Context: Can Porsche beat Tesla with its new Taycan EV ? 0.589 gen 0.168 ranker 0.589 No, but they can beat a Tesla with a Tesla. That's the point of the article 0.565 gen 0.198 ranker 0.565 No, but they can beat a Tesla with a Tesla. That's the point of the article, right? 0.558 gen 0.208 ranker 0.558 No, but they can beat a Tesla with a Tesla. That's the point of the article. 0.549 gen 0.232 ranker 0.549 No, but they can beat a Tesla with a Tesla. 0.546 gen 0.223 ranker 0.546 No, but they can beat a Tesla with a Tesla. That's what I'm saying. I'm not saying they can beat a Tesla. 0.533 gen 0.208 ranker 0.533 No, but they can beat a Tesla with a Tesla EV. 0.530 gen 0.211 ranker 0.530 No, but they can beat a Tesla with a Tesla. That's what I'm saying. I'm not saying they can beat Tesla. 0.528 gen 0.164 ranker 0.528 No, but they can beat a Tesla with a Tesla. That's the point 0.527 gen 0.217 ranker 0.527 No, but they can beat a Tesla with a Tesla. That's what I'm saying. 0.515 gen 0.207 ranker 0.515 No, but they can beat a Tesla with a Tesla. That's the point. 0.507 gen 0.185 ranker 0.507 No, but they can beat a Tesla with a Tesla 0.506 gen 0.215 ranker 0.506 No, but they can beat a Tesla with a Porsche. 0.498 gen 0.171 ranker 0.498 No, but they can beat a Tesla with a Tesla EV 0.471 gen 0.197 ranker 0.471 No, but they can beat a Tesla with a Tesla. That's what I'm saying! 0.462 gen 0.169 ranker 0.462 No, but they can beat a Tesla with a Porsche 0.428 gen 0.212 ranker 0.428 No, but they can beat a Tesla. 0.407 gen 0.206 ranker 0.407 No, but they can beat the Tesla. 0.404 gen 0.154 ranker 0.404 No, but they can beat a Tesla 0.393 gen 0.153 ranker 0.393 No, but they can beat the Tesla

Am I using the wrong model?

chiranshu14 avatar Feb 23 '21 01:02 chiranshu14

@chiranshu14 try this model https://www.dropbox.com/s/ipgybswhwbszqb1/dialogpt2_large_fs.pkl?dl=0

ArEnSc avatar Feb 23 '21 05:02 ArEnSc

@chiranshu14 the results are better than I don't know haha which I get alot.

ArEnSc avatar Feb 23 '21 15:02 ArEnSc

@ArEnSc @golsun

python src/generation.py play -pg=restore/medium_ft.pkl -pr=restore/updown.pth --sampling

This command worked perfectly, got awesome results. I'm going to play around with this some more. But I think this looks perfect so far.

Thank you @golsun for suggesting (and building) DialoRPT. Really liked the idea of ranking the responses & @ArEnSc for your guidance!

chiranshu14 avatar Feb 23 '21 23:02 chiranshu14

@ArEnSc @golsun

python src/generation.py play -pg=restore/medium_ft.pkl -pr=restore/updown.pth --sampling

This command worked perfectly, got awesome results. I'm going to play around with this some more. But I think this looks perfect so far.

Thank you @golsun for suggesting (and building) DialoRPT. Really liked the idea of ranking the responses & @ArEnSc for your guidance!

Great! Thank @chiranshu14 for trying our DialogRPT! !

golsun avatar Feb 24 '21 05:02 golsun

No it’s not related to Tay. On Mon, Feb 22, 2021 at 7:43 AM Michael Chung @.***> wrote: @golsun https://github.com/golsun is this the same technique that was used to train Tay on twitter? it seems like it Thanks. @chiranshu14 https://github.com/chiranshu14 what @golsun https://github.com/golsun suggested doesn't work, I have an earlier model of this laying around some where I just have to find it — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#43 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCQUMSUB6PF3GQJPUME2PLTAJ3RVANCNFSM4NWV22LA . -- Thanks! Xiang

Oh thanks yeah I watched the video, essentially, it helps bring to better responses in data a bit by predicting which are more preferred for a response, smart!.

Is there anyway to get Tay like learning behaviour from any project Microsoft has open sourced? it seems like it's doing online learning based off ranking of specific things said to it, based off feedback from the community?

Thank you @ArEnSc for trying DialogRPT! Sorry I'm not aware of open-sourced repo similar to Tay.

golsun avatar Feb 24 '21 05:02 golsun

@golsun @ArEnSc

Since this model is huge and requires GPU to compute inferences. Do you know if there is any simpler way to generate responses? Can we make it run in js to minimize the response time? I know it's very difficult to do that, but can we port it to tflite or something similar that won't take much time/space?

chiranshu14 avatar Feb 24 '21 05:02 chiranshu14

@chiranshu14 I think if you want to execute this there are two ways distill the model, requires a lot of work, and then run it with js on client side, or create a fast api server with rest or socket based api, host the model and send the information across the net and wait for the response. Otherwise you really have no way of using this. Unless you host it offline. Is there no CPU inference mode ? I think there is, and it's slow.

ArEnSc avatar Feb 24 '21 14:02 ArEnSc

@ArEnSc

  1. Yes, I tried with my cpu (3700x) and it was still slow. It's going to be slower on a server then. Hence a rest api won't be an option for me.

I'll look into distilling of a model. I guess that's my only option as I cannot use a server with a GPU.

Thanks!!

chiranshu14 avatar Feb 24 '21 14:02 chiranshu14

@chiranshu14 do you have the approximate inference time ? Also what changed that made this work? did you just load a different model?

ArEnSc avatar Feb 24 '21 14:02 ArEnSc

@ArEnSc with GPU it took around 1-2 seconds, while on CPU it took approximately 8 seconds. First I tried with the large model that you had linked to but had issues loading the weights. The medium model worked fine, just use the command - python src/generation.py play -pg=restore/medium_ft.pkl -pr=restore/updown.pth --sampling

Also, I noticed that if we do not use the updown.pth the results were not as good, so that's important.

chiranshu14 avatar Feb 25 '21 00:02 chiranshu14

@chiranshu14 thanks and good luck with what you are working on

ArEnSc avatar Feb 25 '21 00:02 ArEnSc

@ArEnSc @golsun

python src/generation.py play -pg=restore/medium_ft.pkl -pr=restore/updown.pth --sampling

This command worked perfectly, got awesome results. I'm going to play around with this some more. But I think this looks perfect so far.

Thank you @golsun for suggesting (and building) DialoRPT. Really liked the idea of ranking the responses & @ArEnSc for your guidance!

@chiranshu14 After analyzing, the results you got are satisfying or not?

alan-ai-learner avatar Jul 09 '21 10:07 alan-ai-learner

@ArEnSc @golsun

python src/generation.py play -pg=restore/medium_ft.pkl -pr=restore/updown.pth --sampling

This command worked perfectly, got awesome results. I'm going to play around with this some more. But I think this looks perfect so far.

Thank you @golsun for suggesting (and building) DialoRPT. Really liked the idea of ranking the responses & @ArEnSc for your guidance!

@chiranshu14 After analyzing, the results you got are satisfying or not?

@alan-ai-learner The model was good. It wasn't perfect as the examples on the repo home page. But satisfactory. But I decided to drop the idea of using dialogpt or similar models because there's no assurance that it's going to give good responses. It's trained on reddit. It might suddenly start talking about drugs and stuff😅. Which I don't want my app users to see. It feels more like a concept to me than a real deployable model. You can try fine tuning it on your own dataset. Which I did try but still wasn't satisfied.

I've decided to go with rasa. It's deterministic.

chiranshu14 avatar Jul 09 '21 10:07 chiranshu14

@ArEnSc @golsun python src/generation.py play -pg=restore/medium_ft.pkl -pr=restore/updown.pth --sampling This command worked perfectly, got awesome results. I'm going to play around with this some more. But I think this looks perfect so far. Thank you @golsun for suggesting (and building) DialoRPT. Really liked the idea of ranking the responses & @ArEnSc for your guidance!

@chiranshu14 After analyzing, the results you got are satisfying or not?

@alan-ai-learner The model was good. It wasn't perfect as the examples on the repo home page. But satisfactory. But I decided to drop the idea of using dialogpt or similar models because there's no assurance that it's going to give good responses. It's trained on reddit. It might suddenly start talking about drugs and stuff😅. Which I don't want my app users to see. It feels more like a concept to me than a real deployable model. You can try fine tuning it on your own dataset. Which I did try but still wasn't satisfied.

I've decided to go with rasa. It's deterministic.

I think the issue is, most people have high expectations on these, you likely switch over to this when you rules base model fails and thing's go off script even GPT-3 isn't as great. It really depends at your audience as well and their expectations

ArEnSc avatar Jul 10 '21 23:07 ArEnSc

@golsun @ArEnSc can you guys please take a look at this issue , and help.

alan-ai-learner avatar Jul 11 '21 03:07 alan-ai-learner