nanoGPT issues

Results 297 nanoGPT issues

Sort by recently updated

DistributedSampler

Hello, Could someone explain to me how is the Dataset being divided between all the GPUs ? I know that Pytorch have something like DistributedSampler to do that, but I...

caiodataopshouse

Is there an easy way to add generation of multiple possible sequences?

vladimirlitvinyuk

Source of downloading train.bin and val.bin

Hello, I am training gpt2 from scratch, but I found that the data processing of openwebtext is too slow, and our GPU server can't connect to the Internet. It's taken...

TccccD

Model.py simplifications

*Accidently messed up with the PR and the branch, so let's try one more time* I really don't like making such somewhat big PRs, but don't want to bombard with...

Andrei-Aksionov

Is there a reason why GPT2 does not trigger <eos> token at all?

I been trying to use GPT2-1.5b to do some Q/A but it seems that the model continues to generate (repeating itself over and over again) until max tokens are reached....

timothylimyl

Add some opinionated guide for fine-tuning

It could be interesting to have some strong opinionated guide from the author addressing some typical issues: - The need or not of freezing some layers while fine tuning, and...

arivero

running prepare.py on a very large dataset

Hi there, I have a custom dataset that is quite large ~40GB, similar to the openwebtext data you present as an example. My dataset is not in huggingface, and running...

aartivnkt

Add streaming output with minimal intervention.

AutomaticHourglass

illegal memory access was encountered while running default GPT2 - small Training on NVIDIA GPU

I am trying to train the gpt2-small model with DDP on a 8x80GB H100 Nvidia GPUs. Irrespective of the pytorch nightly versions, I always ends up with the below error...

kannan-scalers-ai

eval_gpt2 error: missmatch for transformers.h* copying a param with shape * from checkpoint, the shape ...

`RuntimeError: Error(s) in loading state_dict for GPT2LMHeadModel: size mismatch for transformer.h.0.attn.c_attn.weight: copying a param with shape torch.Size([2304, 768]) from checkpoint, the shape in current model is torch.Size([768, 2304]). size mismatch...

zscwind

nanoGPT
nanoGPT copied to clipboard

Metadata

DistributedSampler

Is there an easy way to add generation of multiple possible sequences?

Source of downloading train.bin and val.bin

Model.py simplifications

Is there a reason why GPT2 does not trigger <eos> token at all?

Add some opinionated guide for fine-tuning

running prepare.py on a very large dataset

Add streaming output with minimal intervention.

illegal memory access was encountered while running default GPT2 - small Training on NVIDIA GPU

eval_gpt2 error: missmatch for transformers.h* copying a param with shape * from checkpoint, the shape ...

← Metadata

Owner

Metadata

nanoGPT nanoGPT copied to clipboard

Metadata

← Metadata

Owner

Metadata

nanoGPT
nanoGPT copied to clipboard