NanoCode012

Results 163 comments of NanoCode012

Are you sure it's due to cpu offload? Can you try use the regular ds3_bf16

I know this is 3 years too late, but I would like to add this for future readers. I implemented this feature in my fork but did not create a...

@ElleLeonne , thank you for answering. I also see your loss being 0. Isn't that incorrect? I don't think it should go that low right? I attached a sample training...

Hello @ElleLeonne , thanks for the reply. > when switching to a new dataset I noticed this issue originally with a custom dataset but was also able to reproduce it...

> Yes, the original cleaned version worked fine. After fixing the problem, Loss appears to stay steady for a single epoch. @ElleLeonne , may I clarify which model size you...

> 7bn works with the cleaned alpaca dataset, and Another dataset of mine that uses a similar, yet not identical, format, with different key names. > […](#) @ElleLeonne , have...