VatsaDev
VatsaDev
@houda-w While what @the-crypt-keeper said is correct, the issue of incomplete sentences can also be a different one. GPT-2 is an auto complete, It will just keep going. It fill...
1. Try downloading the entire git repository and work from there there, that made it work for me. Also, get your python on path, and just CD into the directory...
Dude you are on a single a100, you need more to scale. You've looked at every parameter other than model size. Also do you have a link to the dataset?...
Ah I see, The dataset should be fine, and the llama2 tokenizer should work, but you would need to change dataloader to tokenize the pdfs
hmm 25m params, but also a max sequence len of 1024, try lowering to 512? Also, a +1 loss is pretty good at that scale, but Also if you're using...
{'dim': 4096, 'multiple_of': 256, 'n_heads': 32, 'n_layers': 32, 'norm_eps': 1e-05, 'vocab_size': -1} - that seems big for an A6000 On 'mps' compile=True fails and compile=True failed on the V100 and...
While I personally Don't own an MPS, and the highest I've trained till is gpt2-medium, but I have made `1.12` loss, I'd like to ask, Have you tried Increasing the...
What do you mean by "Sharpen filter" what does that mean for inputs?
To my understanding, we don't add negative values to the tokenizer, we just extend vocab, like this: ```python # gpt-2 encodings print("loading GPT-2 encodings...") enc = tiktoken.get_encoding("gpt2") encode = lambda...
> the random seed may draw the same training sets and in the same order That would be amazingly rare wouldn't it?