VatsaDev
VatsaDev
Oh i thought you wanted to expand the tokenizer, I've never though of or heard of two tokenizers in the model at different points, except for multimodality? Might look into...
Huh, There's actually an issue with my project ideas name on it. I have a NanoChatGPT [here](https://github.com/VatsaDev/nanoChatGPT) It has Chat functionality, human, bot, and endOfText tokens, along with a conversational...
This isnt even a dev issue man, its a common expression that one learn by paying attention in first grade instead of trying to ask GPT-4. Teeth -> performance Education...
check the encoder decoder alphabets dicts?
hmm how are you getting this error? I've used several python 3.7+ versions and its fine?
That is a crazy high learning rate, could be the issue, also check your data, and check val loss for overfitting
Mistral Is more data secret sauce than architecture change, it may only be slightly better
need to use pytorch-xla for that, or reimplement in Jax
From My finetuning experience, this model has always been `1~5%`
thats just an arbitrary number, it can be whatever you want. Checkpoints save all the time, you can add a stop for a certain loss, or you can wait for...