Musab Gultekin
Musab Gultekin
Try disabling Instant Run
I fixed it by doing this right after the install dependencies section: ``` !pip install jaxlib==0.1.67 ``` And restart the runtime if it asks Though it feels so fragile. Don't...
I haven't tested this but will this allow 70B on 8x80GB? I was only able to full-fine tune 70B with cpu offloading
Are the hyperparams similar to llama-2 instruct model's training? Otherwise, we can maybe also change some default hyperparams too? such as LR. I see its set as 2e-5 for now,...
I could probably do w&b sweep. For sure will let you know when I have some results
@rohan-varma Have you tested if full-weight training works with 8x80GB ? Maybe it works if we use 8bit AdamW?
We can also use `torch.nn.utils.clip_grad_norm_` instead of manually calculating the norms. : https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html
We can use `float('inf')` instead of 1? So it doesn't clip
@ebsmothers Lint fixed. Will make sure to run it next time. Thanks!
@leoentersthevoid That is a wonderful idea. In fact I just looked into the possibility and this is definitely feasible. I wonder if we can use Youtube based podcasts OR scraping...