pGit1

Results 14 comments of pGit1

My network learned from data. I used CIFAR 768 model but 1. It takes 5 minutes per epoch to train on a P600 2. My results were no where near...

That is weird. Not sure why that would be. In my example I used Cutout and some other augmentation techniques and was that far off. If their mode is that...

![image](https://user-images.githubusercontent.com/13975114/36337142-e27a488c-135d-11e8-84a9-6e16678971b9.png) Having a hard time interpreting this cloud with the dots in the middle of it and I cant find what it means in the paper. As a result I...

@Agent007 I am not sure I am following. So h_sub_i is the concatenated output of h_sub_i-1 and the dotted lines represents the concatenated output of h_sub_i-2??

@oobabooga sorry for the bad question, but how do we get these updates? Not sure which library to ```pip install```.

Sorry. I dont actually have this repo installed. From my research looks like the latest iteration of PEFT needs to be pulled down. Thanks for your help! I am going...

@ItsLogic Can you show what your trainer args and hyper params are for the 13B training run? My models seem to take WAY longer than 10 hours to train. on...

@ItsLogic nevermind. The longer training time definitely stemmed from cutoff len going from 256 to 512.

@zhangfaen I think ALL of this "supervised" finetuning confusion stems from **annoying** use of terms on part of the community as popularized by the "SFT" portion of this paper: https://openreview.net/pdf?id=TG8KACxEONSee...

@zhangfaen My above answer is mostly correct. I answered my own question. All these people are doing is next word prediction in standard "teacher forcing" setup. Its just all **obfuscated**...