Musab Gultekin comments

Results 55 comments of


                                            Musab Gultekin

Unable to get provider android.arch.lifecycle.ProcessLifecycleOwnerInitializer

Try disabling Instant Run

AttributeError: module 'jaxlib.pocketfft' has no attribute 'pocketfft'

I fixed it by doing this right after the install dependencies section: ``` !pip install jaxlib==0.1.67 ``` And restart the runtime if it asks Though it feels so fragile. Don't...

FSDP Llama3 wrapping improvements for full finetune

I haven't tested this but will this allow 70B on 8x80GB? I was only able to full-fine tune 70B with cpu offloading

Default to llama3-8b-instruct

Are the hyperparams similar to llama-2 instruct model's training? Otherwise, we can maybe also change some default hyperparams too? such as LR. I see its set as 2e-5 for now,...

Default to llama3-8b-instruct

I could probably do w&b sweep. For sure will let you know when I have some results

Llama3-70B LoRA multi GPU

@rohan-varma Have you tested if full-weight training works with 8x80GB ? Maybe it works if we use 8bit AdamW?

Compute grad norm

We can also use `torch.nn.utils.clip_grad_norm_` instead of manually calculating the norms. : https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html

Compute grad norm

We can use `float('inf')` instead of 1? So it doesn't clip

Documentation: Clarify all llama3 recipes

@ebsmothers Lint fixed. Will make sure to run it next time. Thanks!

Download transcripts of khan academy

@leoentersthevoid That is a wonderful idea. In fact I just looked into the possibility and this is definitely feasible. I wonder if we can use Youtube based podcasts OR scraping...