Thomas Capelle

Results 169 comments of Thomas Capelle

I would also use this PR to remove/replace old stuff in wandb-artifacts (and put this file in there as a getting started)

can you make both wandbcode consistent?

I would also like more info about this. Do you use Deepspeed to increase batch size? A 7B model fits nicely on 80GB GPUs without any model paralellism.

Thanks for the prompt response =). BTW outstanding preso at DL.ai @edbeeching ! What I am curious is why use Deepspeed zero3 when using 80GB GPUs, is it faster? or...

Yes, but in the Readme: > Full fine-tuning on a multi-GPU machine with DeepSpeed ZeRO-3 (tested on an 8 x A100 (80GB) node) I am curious about why you chose...

The DPO recipe with a 7b model with config_full get's me OOM so I was wondering what should I reduce to keep the recipe consistent > I am on 8xA100...

@jamie-rasmussen I was missing the sidebar.ts update

I have read the CLA Document and I hereby sign the CLA

Great! missing the docs bro.

Hey, it depends. We probably should merge this into a working branch instead of main, as it introduces breaking changes and removes a lot of files. The scores obtained are...