VatsaDev comments

Results 88 comments of


                                            VatsaDev

trafficstars

Newbie Q: does the training data (train_ids) has to be consecutive? can I inject -1 as the integer marker id into train_ids?

To my understanding, we don't add negative values to the tokenizer, we just extend vocab, like this: ```python # gpt-2 encodings print("loading GPT-2 encodings...") enc = tiktoken.get_encoding("gpt2") encode = lambda...

i think the prefix found in checkpoints is coming from how the model is structured ...

this belongs in #325 probably should post there

Newbie Q: is it possible to train n (look ahead) tokens at a time?

The generation code works like this: ```python @torch.no_grad() def generate(self, idx, max_new_tokens, temperature=1.0, top_k=None): """ Take a conditioning sequence of indices idx (LongTensor of shape (b,t)) and complete the sequence...

Newbie Q: is it possible to train n (look ahead) tokens at a time?

Yep its pure garbage: User: Hello? Bot: IHi, Hi, there!, I Hi, I Hi'm I, I i i I Hi I HiHi,!< I HiHi., Hi I HiHi., Hi hi hi...

Newbie Q: is it possible to train n (look ahead) tokens at a time?

Yep I realize that, that's why i mentioned the second part. I don't think it's possible without some sort of architecture change, as most LLMs are based off of predicting...

Training gpt2 on a single GPU

That's really Vague, You're going to have to give way more information than that, like Dataset size, and the gpt2 model size you want to pretrain. The estimate from LLama-2-70b...

Training gpt2 on a single GPU

As I said before, When karpathy Trained the model on openweb text for gpt2 124 mil, on 8xA100 ,it took him **96 hours**. Thats the default values, and openwebtext.

Training gpt2 on a single GPU

@hmbui-noze, for a decent model I would always recommend finetune, but at my repo [nanoChatGPT](https://github.com/VatsaDev/nanoChatGPT), I have the hyperparams for finetune, and these take ~26min ``` python eval_interval = 5...

Finetuning on Downstream Tasks: Eval zero-shot perplexities on standard evals (e.g. LAMBADA? HELM? etc.)

I don't GPT-2 was really meant for benchmarks other than perplexities on datasets? You could get the train/val loss when finetuning on a benchmarks train split, while using val split...

Can I train the model on Intel GPU like A730m?

If you're gpu is supported by pytorch the regular way, it will work fine right off the bat. If its not the same, you need code modifications