WizardLM from train import smart_tokenizer_and_embedding

The weight_diff_wizard.py is importing from a train module. But there is no train module in the repo.

May 30 '23 19:05 HristoBuyukliev

I think it was renamed to train_freeform.py but that file also has a now-nonexistant utils import. It seems some things were changed and a lot has broken...

Jun 01 '23 14:06 paulhager

It's trying to import utils.py from FastChat. If you clone lmsys/Fastchat and copy the utils.py -as well as rename train_freeform.py to train.py - it should work.

Jun 01 '23 14:06 TheBloke

It's trying to import utils.py from FastChat. If you clone lmsys/Fastchat and copy the utils.py -as well as rename train_freeform.py to train.py - it should work.

Did that. Still for the 30B model it seems like you need more than 256 GB ram, getting OOM Errors with a 256 GB machine.

Jun 09 '23 07:06 windprak

add the function from https://github.com/tatsu-lab/stanford_alpaca/blob/main/train.py

def smart_tokenizer_and_embedding_resize( special_tokens_dict: dict, tokenizer: transformers.PreTrainedTokenizer, model: transformers.PreTrainedModel, ): """Resize tokenizer and embedding. Note: This is the unoptimized version that may make your embedding size not be divisible by 64. """ num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict) model.resize_token_embeddings(len(tokenizer))

if num_new_tokens > 0:
    input_embeddings = model.get_input_embeddings().weight.data
    output_embeddings = model.get_output_embeddings().weight.data

    input_embeddings_avg = input_embeddings[:-num_new_tokens].mean(dim=0, keepdim=True)
    output_embeddings_avg = output_embeddings[:-num_new_tokens].mean(dim=0, keepdim=True)

    input_embeddings[-num_new_tokens:] = input_embeddings_avg
    output_embeddings[-num_new_tokens:] = output_embeddings_avg

Jun 14 '23 02:06 apachemycat

I managed to merge the 30B today by doing the following:

using python-3.10 venv git clone im-sys/Fastchat pip install . in Fastchat directory rename train_freeform.py to train.py install other missing python deps per errors, there were quite a few Adding 200GB of swap space (previously only had 8gb) Merging deltas per the commands provided, more or less: python weight_diff_wizard.py recover --path_raw ../../llama-30b/ --path_diff ../../WizardLM-30B-V1.0/ --path_tuned ../../Wizard-LM-30b-Merged/

memory+swap peaked around 275 GB - My system has only 128GB physical ram (124GB available to the LXC) and the rest was swap. It took a while to merge with all the swapping. The initial load was around 248GB or so, then it creeped up during the merge slowly towards the peak and started declining after hitting ~ 275GB. To be able to do this merge in memory I think you'd need 288GB or more system RAM.

My main problem was due to working off an LXC inside proxmox, so had to add the swap disk to the host. Adding swap to the LXC that isn't available in the host just caused memory allocation failures. Adding the swap to both the host and the LXC got it working for me though in this case.

Jun 14 '23 15:06 yatesdr

It seems something broken in the repo. But I solved it by:

copying the code of the smart_tokenizer_and_embedding_resize function definition from train_freeform.py and pasting it into weight_diff_wizard.py;
deleting the line from train import smart_tokenizer_and_embedding_resize in weight_diff_wizard.py;
modifying the line from typing import Optional to from typing import Dict, Optional;

Jun 15 '23 07:06 Godsing

When I try to merge, I get the following error: RuntimeError: The size of tensor a (32000) must match the size of tensor b (32001) at non-singleton dimension 0 Do you have any ideas to solve it?

Jun 27 '23 06:06 young-chao

WizardLM WizardLM copied to clipboard

from train import smart_tokenizer_and_embedding_resize

WizardLM
WizardLM copied to clipboard