> I think it is one batch, which is a common practice in GANs. thanks a lot. I will have a try.
hi. how about the accuracy if the discriminator is too strong?
Thanks for your kind response. Below are detailed information: model: thred /random300 embedding data: 5-turn dataset in reddit tensorflow version: 1.12.0
I have the same problem
> I was able to train using deepspeed on 8 V100 GPUs. Here is the training script and deepseed config file. > > torchrun --nproc_per_node=8 --master_port=9776 train.py --model_name_or_path hf_model/llama-7b --data_path...