Nathan Cooper
Nathan Cooper
@TheHmmka I trained the larger model on one of my school's machines that had 4 1080ti's. I'm sure you could train it on a cloud service relatively easily though, but...
Hey @etrigger, could you show me the error you are getting when trying to download or generate the data? I tried to reproduce this, but it was working for me
@etrigger what an interesting error. I did a bit of digging and it seems to be an issue with colab in certain situation. Here is an issue about it: https://github.com/googlecolab/colabtools/issues/1771,...
@etrigger I have the format that dialoGPT requires in the data section of my blog: https://nathancooper.io/i-am-a-nerd/chatbot/deep-learning/gpt2/2020/05/12/chatbot-part-1.html#The-Data!. I recommend trying to first get it into a format that my code expects...
> > (don’t tell my Ph.D. advisor I said that). > > _laughs in spanish_ > > My brain got me at the ball and bat though. > > One...
Testing comment feature!
@samyam thanks for the comment and discussing the use case for a single GPU, it clears up a lot of my confusion. I will test that out and update with...
@samyam I did what you recommended and got a lot better results using larger batch sizes (doubling the batch size of the t5-large model compared to not using deepspeed). One...
@thakkarparth007 that is a good point that I didn't think of. To me it does seem like gradient accum would be better for most cases except for the one you...
@thiswillbeyourgithub I think I address all your comments. Lemme know what you think