VatsaDev

Results 88 comments of VatsaDev
trafficstars

To my understanding, we don't add negative values to the tokenizer, we just extend vocab, like this: ```python # gpt-2 encodings print("loading GPT-2 encodings...") enc = tiktoken.get_encoding("gpt2") encode = lambda...

The generation code works like this: ```python @torch.no_grad() def generate(self, idx, max_new_tokens, temperature=1.0, top_k=None): """ Take a conditioning sequence of indices idx (LongTensor of shape (b,t)) and complete the sequence...

Yep its pure garbage: User: Hello? Bot: IHi, Hi, there!, I Hi, I Hi'm I, I i i I Hi I HiHi,!< I HiHi., Hi I HiHi., Hi hi hi...

Yep I realize that, that's why i mentioned the second part. I don't think it's possible without some sort of architecture change, as most LLMs are based off of predicting...

That's really Vague, You're going to have to give way more information than that, like Dataset size, and the gpt2 model size you want to pretrain. The estimate from LLama-2-70b...

As I said before, When karpathy Trained the model on openweb text for gpt2 124 mil, on 8xA100 ,it took him **96 hours**. Thats the default values, and openwebtext.

@hmbui-noze, for a decent model I would always recommend finetune, but at my repo [nanoChatGPT](https://github.com/VatsaDev/nanoChatGPT), I have the hyperparams for finetune, and these take ~26min ``` python eval_interval = 5...

I don't GPT-2 was really meant for benchmarks other than perplexities on datasets? You could get the train/val loss when finetuning on a benchmarks train split, while using val split...

If you're gpu is supported by pytorch the regular way, it will work fine right off the bat. If its not the same, you need code modifications