Allan Jie

Results 72 comments of Allan Jie

Just look at the code, does it affect the randomness, it seems we always take the sample in order within a single "__iter__" function in the dataset

So, may I conclude the following statements: 1. It would be better to NOT perform generation during training. Even though our standard practice would be evaluating the model (on the...

> So does that mean that if i want to eval every epoch, i would have to merge the lora adapter and then run the model.generate at every epoch? I...

for PPL, you just need forward, I don't think you need to call the `generate` function

> @markusdr u might use following > > ```python > optimizer = torch.optim.AdamW(model.parameters(), lr=lr) > ``` But in this case, does it mean we cannot create different groups for parameters?

I have the same issue here for larger huggingface model (i.e. 30B) using the deepspeed init_inference. but has no problem with smaller model (6.7B)

These two are the files: https://github.com/allanj/Deductive-MWP/blob/main/data/mawps_asdiv-a_svamp/trainset_nodup.json https://github.com/allanj/Deductive-MWP/blob/main/data/mawps_asdiv-a_svamp/testset_nodup.json

That would be five checkpoints as the experiments were conducted in five-fold. Do yo want that?

Sorry, I don't think I keep that (only the log files are available) as limited space. But I can try to run the experiments again for you.

I'm currently on leave for 10 days. So probably update you later