VatsaDev comments

Results 88 comments of


                                            VatsaDev

trafficstars

load generated text locally

@houda-w While what @the-crypt-keeper said is correct, the issue of incomplete sentences can also be a different one. GPT-2 is an auto complete, It will just keep going. It fill...

Running the code

1. Try downloading the entire git repository and work from there there, that made it work for me. Also, get your python on path, and just CD into the directory...

Training nanoGPT on COVID-19 Dataset

Dude you are on a single a100, you need more to scale. You've looked at every parameter other than model size. Also do you have a link to the dataset?...

Training nanoGPT on COVID-19 Dataset

Ah I see, The dataset should be fine, and the llama2 tokenizer should work, but you would need to change dataloader to tokenize the pdfs

Training nanoGPT on COVID-19 Dataset

hmm 25m params, but also a max sequence len of 1024, try lowering to 512? Also, a +1 loss is pretty good at that scale, but Also if you're using...

Training nanoGPT on COVID-19 Dataset

{'dim': 4096, 'multiple_of': 256, 'n_heads': 32, 'n_layers': 32, 'norm_eps': 1e-05, 'vocab_size': -1} - that seems big for an A6000 On 'mps' compile=True fails and compile=True failed on the V100 and...

loss increase when finetuning on shakespeare datasets with gpt-xl and mps device

While I personally Don't own an MPS, and the highest I've trained till is gpt2-medium, but I have made `1.12` loss, I'd like to ask, Have you tried Increasing the...

Abnormal values in mixing coefficients of token shift

What do you mean by "Sharpen filter" what does that mean for inputs?

Should -1 marker (as special token) be counted in vocab_size?

To my understanding, we don't add negative values to the tokenizer, we just extend vocab, like this: ```python # gpt-2 encodings print("loading GPT-2 encodings...") enc = tiktoken.get_encoding("gpt2") encode = lambda...

Resume Training

> the random seed may draw the same training sets and in the same order That would be amazingly rare wouldn't it?