Daniel Holler
Daniel Holler
I might start to understand a little bit here. Would the idea be to generate a single token at a random location for each batch by giving the model: `(prompt)...
I've modified the training loop to be as I described above, and it seems to work (although I'm not 100% sure that the GT Encodec indices are correctly determined). Somewhere...
Still some work to do in dataloader to ensure proper windows (aka blocks in nanoGPT) of X,Y data is prepared for good training. Right now useless data might be used...
I think I've isolated the backward error (see above) to the caching mechanism in the model. Will work on a solution today and then we should be able to have...
> in the middle of finishing something, haven't had time to look at this, will do it soon, sorry! Completely understand, no pressure! I'm just posting updates for whenever you...
I have the model training the LoRA layers now, but the data preparation process is currently garbage and I'm probably also calculating the loss with unit mismatch between the prompt...
> > I think I've isolated the backward error (see above) to the caching mechanism in the model. > > this shouldn't be used during training > > > Haven't...
The LoRA layer (1st layer in model) is 98k trainable parameters. Will try training now to validate the current code.
Currently sweeping learning rate and LoRA rank, alpha, dropout. Graphs look like this so far:  I'm unsure whether the data is fed into the model properly @vatsalaggarwal. I measure...
> I think @lucapericlp is close to turning this into a working solution (without LoRA), so might be better to add LoRA to that... hopefully should be out by EOD....