DialoGPT issues

Distributed training hangs indefinitely at FP16_Optimizer#step

1

When I run distributed training with more than one GPU, training gets stuck at the very beginning and hangs indefinitely. It is stuck in FP16_Optimizer#set (specifically at [this line](https://github.com/NVIDIA/apex/blob/3d01e4a0a188cc8df54bc6e44cf5eb40ff6b4cc5/apex/optimizers/fp16_optimizer.py#L122), where...

justinmanley

Training on our own Data

1

How can we use our own Data to train the model??

kaljitism

What format data files need to be before running prepro.py?

2

My dataset is a .txt file where each line represents an entire dialogue where inside each turn is separated by a tab. How can I convert it to your tour...

dimeldo

Multiturn mode training data

Hey guys! Awesome work. Can you please clarify is there a reason train model with data contains not only N-turn samples if I want to use model in the N-turn...

liehtman

What to expect when training it with 10'000 emails?

3

I planned to train it from scratch with my past 10'000 emails (not in english), do you think it would make sense with such amount of training data?

josephernest

other way to get the 27G dialogue Reddit data?

2

Great job! Thanks for your contributions to dialogue generation! Is there any way that I can get the 27G dialogue Reddit data(147,116,725 dialogue instances) without running demo.py?

dsl-light

Training speed is not as stated in README

1

Hi! I ran the training script on 130 million training instances and I got the following training speed: 1 V100 GPU, FP16 O2, ~14k tokens/sec, ~100 hours 8 V100 GPUs,...

katie-cathy-hunt

Validation data

1

Hey guys! Great work! I really appreciate it! After reading the code, I noticed that the training data is from 12/2015 to 11/2017, while the test data is from 03/2018...

katie-cathy-hunt

Hyperparameters Release

2

Can you guys share the hyperparameters of different model sizes i.e. small, medium, and large? https://github.com/microsoft/DialoGPT/blob/75a4197188a1addf22c5eaea23f16d3b598635d7/LSP_train.py#L46-L82

katie-cathy-hunt

how to choose the hyperparameter when pre-training

1

It is really great work. I wonder if you could share the hyperparameter that is used to pre-train the DialoGPT, especially the hyperparameters for GPT-small

g-jing

DialoGPT
DialoGPT copied to clipboard

Metadata

Distributed training hangs indefinitely at FP16_Optimizer#step

Training on our own Data

What format data files need to be before running prepro.py?

Multiturn mode training data

What to expect when training it with 10'000 emails?

other way to get the 27G dialogue Reddit data?

Training speed is not as stated in README

Validation data

Hyperparameters Release

how to choose the hyperparameter when pre-training

← Metadata

Owner

Metadata

DialoGPT DialoGPT copied to clipboard

Metadata

← Metadata

Owner

Metadata

DialoGPT
DialoGPT copied to clipboard