Daniel Han comments

Results 781 comments of


                                            Daniel Han

[FEA] t-SNE initialization, learning rate, and exaggeration

@dkobak Oh yep. PCA scaling is done. Although slightly differently. I think (can't remember 100%) it was uniform but within [-0.0001f, 0.0001f]. Interesting! n/early_exaggeration sounds much better then. Oh yes...

[FEA] t-SNE initialization, learning rate, and exaggeration

@resnant Oh I just checked. Seems like I was wrong. I was working on https://github.com/rapidsai/cuml/pull/1383 [TSNE PCA Init + 50% Memory Reductions]. It included PCA Init, 50% mem reductions and...

[FEA] t-SNE initialization, learning rate, and exaggeration

@resnant Thanks a lot!!! I'll see what I can do.

ValueError: Can't find a valid checkpoint at checkpoints

Does the folder "checkpoints/checkpoint-3600" exist? Maybe its corrupted?

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Do you know exactly where / which line the error pops out?

different inference result

You're correct! It seems like `max_seq_length`'s default of 4096 is auto scaling TinyLlama, causing bad outputs - I'll fix this asap - thanks for the report!

Not able to add data_collator

I think you need to use `DataCollatorForLanguageModeling` or `DataCollatorForSeq2Seq`

[WIP] Fused CEL

Thanks Jerome and fantastic work - will check this!

[WIP] Fused CEL

@jeromeku Thanks for testing again! Hmm weird on the training loss being noticeable higher hmmm that is really really weird I can understand why the VRAM reduction are less pronounced,...

qlora taining on qwen1.5-15b-chat

Oh that is an issue - the pad_token must be not the same as the eos_token, otherwise the finetune will be incorrect. I'll see if I can extend the tokenizer...