DialoGPT icon indicating copy to clipboard operation
DialoGPT copied to clipboard

Repo not properly designed

Open dimeldo opened this issue 4 years ago • 4 comments

Thank you for your contribution. I noticed a few things about your repo:

  • Variable names are vague and uninformative, which makes it hard to understand the code and adapt it.
  • There are some global variables in LSP_train.py which confuses me, first because they're not informative and don't tell me what's their purpose, second because they seem to interfere with parse-args variables that might have the same purpose. Please consider using only parse-args. and explain each one purpose well.
  • This repository doesn't seem to contain an interact script to chat with the models.
  • You don't give information about the finetuned models that you're providing, eg: what's the max_length that they were trained on, how many epochs on the training data, etc...

dimeldo avatar Oct 16 '19 12:10 dimeldo

Hi dimeldo,

Thanks for your comments. Regarding your question

  1. Though we try to make variable name as intuitive as possible, it may still look vague, so we will definitely add more comments. Please just point out where do you see the problem and we will modify accordingly.

  2. Those three global variables simply defines "what is INFINITY", "when to cache the memory for pytorch" and "how often do we run eval on dev set (number of steps)", we will add comments accordingly.

  3. Please see our comments on "Model decoding", or below "We note that even with properly filtered Reddit dataset, sometimes our model can still generate moderately toxic/inappropriate responses. Due to this reason, we are unable to provide the decoding script at this time (The live demo and decoding script access is upon invitation only now ). We are currently still working on a controlled decoding method to prevent this system from toxic generation. Please stay tuned."

  4. We trained with max-length as 128, and number of epochs is 5 to 10, depends on model size.

Please open this issue again if you have more questions.

intersun avatar Oct 16 '19 20:10 intersun

Is 128 is the max_length of the entire conversation?

So, you believe we should put our money on renting a powerful server like 8 Nvidia V100 for training on your code without being able to decode and generate outputs from this model? Doesn't make much sense...

dimeldo avatar Oct 18 '19 18:10 dimeldo

Yes.

No, we provided the pre-trained model so you don't have to train it again by yourself. Please refer to our README about how to download those pre-trained models. But you have to write the decoding file yourself, unless we figure out a way to filter the toxic responses.

Let me know if you have any other questions.

intersun avatar Oct 18 '19 20:10 intersun

Yes.

No, we provided the pre-trained model so you don't have to train it again by yourself. Please refer to our README about how to download those pre-trained models. But you have to write the decoding file yourself, unless we figure out a way to filter the toxic responses.

Let me know if you have any other questions.

Nearly 3 years passed, any result with filtering. Maybe you could let devs to do it themselves?

nikich340 avatar Sep 06 '22 11:09 nikich340