Musab Gultekin

Results 55 comments of Musab Gultekin

I fixed it by doing this right after the install dependencies section: ``` !pip install jaxlib==0.1.67 ``` And restart the runtime if it asks Though it feels so fragile. Don't...

I haven't tested this but will this allow 70B on 8x80GB? I was only able to full-fine tune 70B with cpu offloading

Are the hyperparams similar to llama-2 instruct model's training? Otherwise, we can maybe also change some default hyperparams too? such as LR. I see its set as 2e-5 for now,...

I could probably do w&b sweep. For sure will let you know when I have some results

@rohan-varma Have you tested if full-weight training works with 8x80GB ? Maybe it works if we use 8bit AdamW?

We can also use `torch.nn.utils.clip_grad_norm_` instead of manually calculating the norms. : https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html

We can use `float('inf')` instead of 1? So it doesn't clip

@ebsmothers Lint fixed. Will make sure to run it next time. Thanks!

@leoentersthevoid That is a wonderful idea. In fact I just looked into the possibility and this is definitely feasible. I wonder if we can use Youtube based podcasts OR scraping...