Daniel Han
                                            Daniel Han
                                        
                                    @dkobak Oh yep. PCA scaling is done. Although slightly differently. I think (can't remember 100%) it was uniform but within [-0.0001f, 0.0001f]. Interesting! n/early_exaggeration sounds much better then. Oh yes...
@resnant Oh I just checked. Seems like I was wrong. I was working on https://github.com/rapidsai/cuml/pull/1383 [TSNE PCA Init + 50% Memory Reductions]. It included PCA Init, 50% mem reductions and...
@resnant Thanks a lot!!! I'll see what I can do.
Does the folder "checkpoints/checkpoint-3600" exist? Maybe its corrupted?
Do you know exactly where / which line the error pops out?
You're correct! It seems like `max_seq_length`'s default of 4096 is auto scaling TinyLlama, causing bad outputs - I'll fix this asap - thanks for the report!
I think you need to use `DataCollatorForLanguageModeling` or `DataCollatorForSeq2Seq`
Thanks Jerome and fantastic work - will check this!
@jeromeku Thanks for testing again! Hmm weird on the training loss being noticeable higher hmmm that is really really weird I can understand why the VRAM reduction are less pronounced,...
Oh that is an issue - the pad_token must be not the same as the eos_token, otherwise the finetune will be incorrect. I'll see if I can extend the tokenizer...