WizardLM
WizardLM copied to clipboard
from train import smart_tokenizer_and_embedding_resize
The weight_diff_wizard.py
is importing from a train
module. But there is no train module in the repo.
I think it was renamed to train_freeform.py
but that file also has a now-nonexistant utils import. It seems some things were changed and a lot has broken...
It's trying to import utils.py from FastChat. If you clone lmsys/Fastchat and copy the utils.py -as well as rename train_freeform.py to train.py - it should work.
It's trying to import utils.py from FastChat. If you clone lmsys/Fastchat and copy the utils.py -as well as rename train_freeform.py to train.py - it should work.
Did that. Still for the 30B model it seems like you need more than 256 GB ram, getting OOM Errors with a 256 GB machine.
add the function from https://github.com/tatsu-lab/stanford_alpaca/blob/main/train.py
def smart_tokenizer_and_embedding_resize( special_tokens_dict: dict, tokenizer: transformers.PreTrainedTokenizer, model: transformers.PreTrainedModel, ): """Resize tokenizer and embedding. Note: This is the unoptimized version that may make your embedding size not be divisible by 64. """ num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict) model.resize_token_embeddings(len(tokenizer))
if num_new_tokens > 0:
input_embeddings = model.get_input_embeddings().weight.data
output_embeddings = model.get_output_embeddings().weight.data
input_embeddings_avg = input_embeddings[:-num_new_tokens].mean(dim=0, keepdim=True)
output_embeddings_avg = output_embeddings[:-num_new_tokens].mean(dim=0, keepdim=True)
input_embeddings[-num_new_tokens:] = input_embeddings_avg
output_embeddings[-num_new_tokens:] = output_embeddings_avg
I managed to merge the 30B today by doing the following:
using python-3.10 venv git clone im-sys/Fastchat pip install . in Fastchat directory rename train_freeform.py to train.py install other missing python deps per errors, there were quite a few Adding 200GB of swap space (previously only had 8gb) Merging deltas per the commands provided, more or less: python weight_diff_wizard.py recover --path_raw ../../llama-30b/ --path_diff ../../WizardLM-30B-V1.0/ --path_tuned ../../Wizard-LM-30b-Merged/
memory+swap peaked around 275 GB - My system has only 128GB physical ram (124GB available to the LXC) and the rest was swap. It took a while to merge with all the swapping. The initial load was around 248GB or so, then it creeped up during the merge slowly towards the peak and started declining after hitting ~ 275GB. To be able to do this merge in memory I think you'd need 288GB or more system RAM.
My main problem was due to working off an LXC inside proxmox, so had to add the swap disk to the host. Adding swap to the LXC that isn't available in the host just caused memory allocation failures. Adding the swap to both the host and the LXC got it working for me though in this case.
It seems something broken in the repo. But I solved it by:
- copying the code of the
smart_tokenizer_and_embedding_resize
function definition fromtrain_freeform.py
and pasting it intoweight_diff_wizard.py
; - deleting the line
from train import smart_tokenizer_and_embedding_resize
inweight_diff_wizard.py
; - modifying the line
from typing import Optional
tofrom typing import Dict, Optional
;
When I try to merge, I get the following error: RuntimeError: The size of tensor a (32000) must match the size of tensor b (32001) at non-singleton dimension 0 Do you have any ideas to solve it?