Thomas Capelle
Thomas Capelle
I would also use this PR to remove/replace old stuff in wandb-artifacts (and put this file in there as a getting started)
can you make both wandbcode consistent?
I would also like more info about this. Do you use Deepspeed to increase batch size? A 7B model fits nicely on 80GB GPUs without any model paralellism.
Thanks for the prompt response =). BTW outstanding preso at DL.ai @edbeeching ! What I am curious is why use Deepspeed zero3 when using 80GB GPUs, is it faster? or...
Yes, but in the Readme: > Full fine-tuning on a multi-GPU machine with DeepSpeed ZeRO-3 (tested on an 8 x A100 (80GB) node) I am curious about why you chose...
The DPO recipe with a 7b model with config_full get's me OOM so I was wondering what should I reduce to keep the recipe consistent > I am on 8xA100...
@jamie-rasmussen I was missing the sidebar.ts update
I have read the CLA Document and I hereby sign the CLA
Great! missing the docs bro.
Hey, it depends. We probably should merge this into a working branch instead of main, as it introduces breaking changes and removes a lot of files. The scores obtained are...