WizardLM icon indicating copy to clipboard operation
WizardLM copied to clipboard

from train import smart_tokenizer_and_embedding_resize

Open HristoBuyukliev opened this issue 1 year ago • 7 comments

The weight_diff_wizard.py is importing from a train module. But there is no train module in the repo.

HristoBuyukliev avatar May 30 '23 19:05 HristoBuyukliev

I think it was renamed to train_freeform.py but that file also has a now-nonexistant utils import. It seems some things were changed and a lot has broken...

paulhager avatar Jun 01 '23 14:06 paulhager

It's trying to import utils.py from FastChat. If you clone lmsys/Fastchat and copy the utils.py -as well as rename train_freeform.py to train.py - it should work.

TheBloke avatar Jun 01 '23 14:06 TheBloke

It's trying to import utils.py from FastChat. If you clone lmsys/Fastchat and copy the utils.py -as well as rename train_freeform.py to train.py - it should work.

Did that. Still for the 30B model it seems like you need more than 256 GB ram, getting OOM Errors with a 256 GB machine.

windprak avatar Jun 09 '23 07:06 windprak

add the function from https://github.com/tatsu-lab/stanford_alpaca/blob/main/train.py


def smart_tokenizer_and_embedding_resize( special_tokens_dict: dict, tokenizer: transformers.PreTrainedTokenizer, model: transformers.PreTrainedModel, ): """Resize tokenizer and embedding. Note: This is the unoptimized version that may make your embedding size not be divisible by 64. """ num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict) model.resize_token_embeddings(len(tokenizer))

if num_new_tokens > 0:
    input_embeddings = model.get_input_embeddings().weight.data
    output_embeddings = model.get_output_embeddings().weight.data

    input_embeddings_avg = input_embeddings[:-num_new_tokens].mean(dim=0, keepdim=True)
    output_embeddings_avg = output_embeddings[:-num_new_tokens].mean(dim=0, keepdim=True)

    input_embeddings[-num_new_tokens:] = input_embeddings_avg
    output_embeddings[-num_new_tokens:] = output_embeddings_avg

apachemycat avatar Jun 14 '23 02:06 apachemycat

I managed to merge the 30B today by doing the following:

using python-3.10 venv git clone im-sys/Fastchat pip install . in Fastchat directory rename train_freeform.py to train.py install other missing python deps per errors, there were quite a few Adding 200GB of swap space (previously only had 8gb) Merging deltas per the commands provided, more or less: python weight_diff_wizard.py recover --path_raw ../../llama-30b/ --path_diff ../../WizardLM-30B-V1.0/ --path_tuned ../../Wizard-LM-30b-Merged/

memory+swap peaked around 275 GB - My system has only 128GB physical ram (124GB available to the LXC) and the rest was swap. It took a while to merge with all the swapping. The initial load was around 248GB or so, then it creeped up during the merge slowly towards the peak and started declining after hitting ~ 275GB. To be able to do this merge in memory I think you'd need 288GB or more system RAM.

My main problem was due to working off an LXC inside proxmox, so had to add the swap disk to the host. Adding swap to the LXC that isn't available in the host just caused memory allocation failures. Adding the swap to both the host and the LXC got it working for me though in this case.

yatesdr avatar Jun 14 '23 15:06 yatesdr

It seems something broken in the repo. But I solved it by:

  1. copying the code of the smart_tokenizer_and_embedding_resize function definition from train_freeform.py and pasting it into weight_diff_wizard.py;
  2. deleting the line from train import smart_tokenizer_and_embedding_resize in weight_diff_wizard.py;
  3. modifying the line from typing import Optional to from typing import Dict, Optional;

Godsing avatar Jun 15 '23 07:06 Godsing

When I try to merge, I get the following error: RuntimeError: The size of tensor a (32000) must match the size of tensor b (32001) at non-singleton dimension 0 Do you have any ideas to solve it?

young-chao avatar Jun 27 '23 06:06 young-chao