VatsaDev comments

Results 88 comments of


                                            VatsaDev

trafficstars

Sample from a subset of the token_embedding_table

Oh i thought you wanted to expand the tokenizer, I've never though of or heard of two tokenizers in the model at different points, except for multimodality? Might look into...

Making nano chatgpt

Huh, There's actually an issue with my project ideas name on it. I have a NanoChatGPT [here](https://github.com/VatsaDev/nanoChatGPT) It has Chat functionality, human, bot, and endOfText tokens, along with a conversational...

Meaning of teeth over education?

This isnt even a dev issue man, its a common expression that one learn by paying attention in first grade instead of trying to ask GPT-4. Teeth -> performance Education...

i am getting encoding errors when i run the sample.py with any start contexts

check the encoder decoder alphabets dicts?

Which Python version can be used

hmm how are you getting this error? I've used several python 3.7+ versions and its fine?

Is this loss curve normal

That is a crazy high learning rate, could be the issue, also check your data, and check val loss for overfitting

Question: Sliding window attention

Mistral Is more data secret sauce than architecture change, it may only be slightly better

How to train nanoGPT using TPU's?

need to use pytorch-xla for that, or reimplement in Jax

What MFU score is to be expected?

From My finetuning experience, this model has always been `1~5%`

Training loss converges much earlier compared to max_iters

thats just an arbitrary number, it can be whatever you want. Checkpoints save all the time, you can add a stop for a certain loss, or you can wait for...