VatsaDev

Results 88 comments of VatsaDev
trafficstars

Oh i thought you wanted to expand the tokenizer, I've never though of or heard of two tokenizers in the model at different points, except for multimodality? Might look into...

Huh, There's actually an issue with my project ideas name on it. I have a NanoChatGPT [here](https://github.com/VatsaDev/nanoChatGPT) It has Chat functionality, human, bot, and endOfText tokens, along with a conversational...

This isnt even a dev issue man, its a common expression that one learn by paying attention in first grade instead of trying to ask GPT-4. Teeth -> performance Education...

hmm how are you getting this error? I've used several python 3.7+ versions and its fine?

That is a crazy high learning rate, could be the issue, also check your data, and check val loss for overfitting

Mistral Is more data secret sauce than architecture change, it may only be slightly better

need to use pytorch-xla for that, or reimplement in Jax

From My finetuning experience, this model has always been `1~5%`

thats just an arbitrary number, it can be whatever you want. Checkpoints save all the time, you can add a stop for a certain loss, or you can wait for...