Edward Beeching comments

Results 40 comments of


                                            Edward Beeching

How to perform full parameter finetuning without A100 GPUs

Thanks @tcapelle zero3 shards the optimizer state, grads and model weights across GPUs. So you should have more memory available. However, if you are tuning a 7b model you may...

How to perform full parameter finetuning without A100 GPUs

This is so the config is compatible with a larger model, e.g. llama-2-70b. I think that for a 7b model no sharding will take place.

Memory Issue with 7b Model Fine-Tuning on 6 H100 GPUs

Hi, is the repo you are referring to this one or another one? Since your question was not clear about this.

Memory Issue with 7b Model Fine-Tuning on 6 H100 GPUs

This is probably related to flash attn being disabled and the large prompt limit of 4096. Are you using deepspeed? Do the yi models not support flash-attn?

Reproducing of Lora Model Result on MT-Bench

Thanks for all your questions and detailed analysis, there are a number of different things to address here. ### LoRA training: The official `zephyr-7b-beta` model used full training. We provide...

Reproducing of Lora Model Result on MT-Bench

We identified the regression using MT bench scores as well. We are rerunning the evals and other exps internally to try and get to the root cause.

Reproducing of Lora Model Result on MT-Bench

Hi @wlhgtc In fact `zephyr-7b-beta` was trained using an internal codebase. `zephyr-7b-dpo-full` was trained using code from this repo, with the same parameters as the internal codebase. This repo contains...

Reproducing of Lora Model Result on MT-Bench

Hi @liutianlin0121 , sorry for the lack of updates. I have been cautiously working through PRs on our internal codebase to identify the root cause. I can confirm that I...

Train on emails

Hi @patchie , thanks for your question. Unfortunately, this sort of question is beyond the scope of the alignment handbook. Perhaps you / your company would be interested in talking...

Is there available SFT fine tuning for zephyr-7B families?

Hi @daehuikim. We did not consider this use case. Are you unable to push there dataset to the hub, even as a private dataset? Otherwise you would need to use...