pGit1 comments

Results 14 comments of


                                            pGit1

Network is not learning :(

My network learned from data. I used CIFAR 768 model but 1. It takes 5 minutes per epoch to train on a P600 2. My results were no where near...

Network is not learning :(

That is weird. Not sure why that would be. In my example I used Cutout and some other augmentation techniques and was that far off. If their mode is that...

![image](https://user-images.githubusercontent.com/13975114/36337142-e27a488c-135d-11e8-84a9-6e16678971b9.png) Having a hard time interpreting this cloud with the dots in the middle of it and I cant find what it means in the paper. As a result I...

Network is not learning :(

@Agent007 I am not sure I am following. So h_sub_i is the concatenated output of h_sub_i-1 and the dotted lines represents the concatenated output of h_sub_i-2??

Error when loading custom finetuned lora: AttributeError: 'NoneType' object has no attribute 'device'

@oobabooga sorry for the bad question, but how do we get these updates? Not sure which library to ```pip install```.

Error when loading custom finetuned lora: AttributeError: 'NoneType' object has no attribute 'device'

Sorry. I dont actually have this repo installed. From my research looks like the latest iteration of PEFT needs to be pulled down. Thanks for your help! I am going...

Anyone try fine-tuning 13B model?

@ItsLogic Can you show what your trainer args and hyper params are for the 13B training run? My models seem to take WAY longer than 10 hours to train. on...

Anyone try fine-tuning 13B model?

@ItsLogic nevermind. The longer training time definitely stemmed from cutoff len going from 256 to 512.

What exactly is the "supervised" task?

@zhangfaen I think ALL of this "supervised" finetuning confusion stems from **annoying** use of terms on part of the community as popularized by the "SFT" portion of this paper: https://openreview.net/pdf?id=TG8KACxEONSee...

What exactly is the "supervised" task?

@zhangfaen My above answer is mostly correct. I answered my own question. All these people are doing is next word prediction in standard "teacher forcing" setup. Its just all **obfuscated**...