Rahul C

Results 16 comments of Rahul C

Hi @juyiming the hyper-parameters for all GLUE Tasks (which includes MNLI/MRPC) & SQUAD_V2.0 are available in the Appendix A4 section. ![image](https://user-images.githubusercontent.com/16897807/131207616-09c358a6-b408-473a-ad6e-1daea8811e71.png)

Change the loss function in train.py to below. ``` def loss(self, x, target, m): #x:b,10 target:b target = target*x zero_tensor = torch.tensor(0.0).cuda() loss = torch.max(zero_tensor,m-(target-x))**2 loss = torch.mean(loss) return loss...

Also considering you got an issue at line 43, you may additionally get an error at line _102_ `acc = pred.eq(labels).cpu().sum().data[0]` in that case change it to `acc = pred.eq(labels).cpu().sum().item()`

@CatTimson @madroidmaq can you try setting `pin_memory=False` in this [line](https://github.com/karpathy/llama2.c/blob/bd182289c596fa6059eb7b3b7c8ccd04b5c90fc3/tinystories.py#L238C40-L238C50) ?

> > @CatTimson @madroidmaq can you try setting `pin_memory=False` in this [line](https://github.com/karpathy/llama2.c/blob/bd182289c596fa6059eb7b3b7c8ccd04b5c90fc3/tinystories.py#L238C40-L238C50) ? > > @RahulSChand According to your method, my problem disappeared, I can train and see the detailed...

@CatTimson what is your PyTorch version? Use `print(torch.__version__)` Pytorch `2.0.1+cu117` works for me. You can get same version in new environment by `pip install torch==2.0.0+cu117 --index-url https://download.pytorch.org/whl/cu117`

@madroidmaq can you remove this `time.time()` from following lines?https://github.com/karpathy/llama2.c/blob/bd182289c596fa6059eb7b3b7c8ccd04b5c90fc3/train.py#L249 https://github.com/karpathy/llama2.c/blob/bd182289c596fa6059eb7b3b7c8ccd04b5c90fc3/train.py#L322 Just put t0=0 and t1=0 & check? Also you can change below to `print(...., flush=True)` to see if the issue...

@CatTimson @kunwar-vikrant @madroidmaq I was able to reproduce the error when using custom dataset. It happens because the data_dir path `./data/tok{vocab_size}/` doesn't have any `.bin` files. You can confirm if...

@madroidmaq what is inside your `data/tok4096/` folder? It should have a bunch of `.bin` files. If it's empty then train.py will be stuck. If its empty then try rerunning train_vocab/pretokenize...

@madroidmaq You can do `apt install sentencepiece` and the `spm_train` command should work. I don't think this change in script is necessary.