Abdelrahman Abdallah
Abdelrahman Abdallah
I got a solution for that 3 line in your code must change from train_pairs *= n_iters//len(train_pairs) train_pairs += [random.choice(train_pairs) for i in range(n_iters%len(train_pairs))] train_pairs = [tensorsFromPair(pair) for pair in...
And add please `pip install dill==0.3.5.1` because current version of dill had removed log from it and will show this error `AttributeError: module 'dill._dill' has no attribute 'log'`
and last could you please add Arabic language it is support in load_dataset and can be download also
i found an easy solution to open chatgpt use the [Urban VPN Proxy](https://chrome.google.com/webstore/detail/urban-vpn-proxy/eppiocemhmnlbhjplcgkofciiegomcon)
set the url in the download.sh and also choose the target folder in linux use bash download.sh and in windows use git bash
@YSLLYW yes if you add new tokens to your tokenizer you should resize your model `model.resize_token_embeddings(len(tokenizer))`
@YSLLYW did you change the original tokenizer or not?
do you mean that in stage one you used [run_clm.py](https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_clm.py) and stage two the output model from run_clm then used in [merge_llama_with_chinese_lora.py](https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/scripts/merge_llama_with_chinese_lora.py)
can you give more explaination about that please
So did you train run_clm.py two times?