alpaca-lora
alpaca-lora copied to clipboard
What are the eos_token_id and bos_token_id
In generate.py, the bos_token_id=1 and eos_token_id=2,
model.config.bos_token_id = 1
model.config.eos_token_id = 2
However, in finetune.py, the tokenizer is directly loaded from the official llama checkpoint, where bos_token_id=0 and eos_token_id=0. How to understand this discrepancy? Thank you!
same question, if the fine-tune need same configuration?
Same question, I finetuned an alpaca-lora using the author's code, and found it will generate a <unk>
instead of <eos>
at the end of response, which will result in some problems.
This is a huge issue. The https://huggingface.co/decapoda-research/llama-Xb-hf
HF models have bad/incorrect token ids mappings for bos/eos vs the original META llama. Now that lots of people are using it to generate models, the end result is bad.
Transformers head now fixes this issue but broke backward compact. You can use use_fast=False
to use old LlamaTokenizer which is old code. Head transformer default to LlamaTokenizerFast.
Everyone needs to check out transformer
head
and use the latest export to hf script on the original facebook models and use that as bases for future training using transformer[head]
. Everyone needs to stop using decapoda models which will cause more and more issues the more you use it's broken tokenizer mapping for training.
https://github.com/huggingface/transformers/pull/22402
For reference, the following is the token mapping generated by transformer[head] convert from llama weights:
{
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0,
"transformers_version": "4.28.0.dev0"
}
If the model you downloaded/referencing has tokenizer that does not match the above, don't use it and just throw it away.
@diegomontoya Thanks for your prompt reply, which addresses my confusion. And I have another question. according to finetune.py, each training sequence is appended with an EOS token during the preprocess. So I think the models trained on these data should tend to generate sentences ending with an [EOS]. However, I use the checkpoint provided in this repo to generate something, and I found the generated sentences end with [EOS] [BOS], instead of a single [EOS]. Is that normal?
For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:
7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf
Everyone needs to check out
transformer
head
and use the latest export to hf script on the original facebook models and use that as bases for future training usingtransformer[head]
. Everyone needs to stop using decapoda models which will cause more and more issues the more you use it's broken tokenizer mapping for training.
is decapoda aware? they might be willing to update their models.
For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:
7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf
with 13B model, the size is 38G of the decapoda_research uploaded, while the model here is about 26G, could you tell me what's the difference between them,please?
For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here: 7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf
with 13B model, the size is 38G of the decapoda_research uploaded, while the model here is about 26G, could you tell me what's the difference between them,please?
Interesting observation.
META AI original LLAMA 13B model weights are 26G in size.
I'm don't why the decapoda_research 13b model is 38G in size.
The yahma/llama-13b-hf was converted using latest transformer git, and matches the original 26G size for the model weights.
@diegomontoya Thanks for your prompt reply, which addresses my confusion. And I have another question. according to finetune.py, each training sequence is appended with an EOS token during the preprocess. So I think the models trained on these data should tend to generate sentences ending with an [EOS]. However, I use the checkpoint provided in this repo to generate something, and I found the generated sentences end with [EOS] [BOS], instead of a single [EOS]. Is that normal?
Also got the same after finetuning on my end. Anybody found a workaround?
For me doing:
model.config.pad_token_id = 0 = tokenizer.pad_token_id = 0 # same as unk token id
model.config.bos_token_id = tokenizer.bos_token_id = 1
model.config.eos_token_id = tokenizer.eos_token_id = 2
- saving the transformer to json file using
tokenizer.save_pretrained(path)
and loading the tokenizer from this file helped in resolving the issue with unk token being generated instead of eos token. (But now i'm having the same issue as diegomontoya mentionned)
For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:
7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf
Thank you very much! This solved some very annoying inference bug in relation to the padding token of the tokenizer that would sometimes show up. If I changed the padding token it would just show up in another batch after a while. For people who might land on this page via Google, this is the error I used to (only sometimes) get:
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
...
File ".../lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Thanks to your uploaded models the issue somehow got fixed!
Any updates to this? Are all things good now? Can we fix old models by changing the tokenizer config or?
@teknium1 You need to retrain on the fixed/updated base HF models. Anything trained using old transformer code on the decapoda models are bound to break. You can hack your way around the diff token ids but I wouldnt recommend it.
@teknium1 You need to retrain on the fixed/updated base HF models. Anything trained using old transformer code on the decapoda models are bound to break. You can hack your way around the diff token ids but I wouldnt recommend it.
Is this mean that I have to download a new llama-hf model and retrain, or i can just use the old one, and use the newest transformer code with LlamaTokenizer?
@teknium1 You need to retrain on the fixed/updated base HF models. Anything trained using old transformer code on the decapoda models are bound to break. You can hack your way around the diff token ids but I wouldnt recommend it.
Is this mean that I have to download a new llama-hf model and retrain, or i can just use the old one, and use the newest transformer code with LlamaTokenizer?
I think it means either train on a llama model converted recently to HF format, or do it yourself with latest transformers. Unfortunately, the best fine tuned models rn are all based on old format. Only thing I can do atm is revert to older transformers commit to resolve
For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:
7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf
Hi @gururise , is it possible to upload llama-30B and llama-65B as well? Thanks!
I would like to report all of Neko's tokenizers are current and match with https://huggingface.co/oobabooga/llama-tokenizer. Also if you want me to update stuff in the future just bug me here or on Neko.
@USBhost Your contributions are appreciated!
For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:
7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf
Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from https://github.com/tloen/alpaca-lora/pull/364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?
For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here: 7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf
Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?
Try Neko or Elinas' repos
For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here: 7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf
Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?
Try Neko or Elinas' repos
elinas/llama-7b-hf-transformers-4.29 and Neko-Institute-of-Science/LLaMA-7B-HF both suffer from the same problem. They also both use the same two-big-shards config, which confirms my suspicion it is the cause (I can also see the RAM peaking and the process aborting when the 12.68 GB limit is hit; I'm talking of system RAM, not GPU RAM here).
So to sum it up, it would be nice to have a test configuration which can execute in the free Google Colab notebook - which I know is technically possible because decapoda-research/llama-7b-hf can be trained there (although the training produces wrong results).
(I also tried Kaggle, but there it fails because of the 20GB disk space limit.)
So to sum it up, it would be nice to have a test configuration which can execute in the free Google Colab notebook - which I know is technically possible because decapoda-research/llama-7b-hf can be trained there (although the training produces wrong results).
I uploaded jploski/llama-7b-hf, which allows just this. It uses 34 checkpoint shards, but is otherwise identical to yahma/llama-7b-hf. (And the results of test.py from https://github.com/tloen/alpaca-lora/pull/364 are ok when the final LoRA weights from it are fed to generate.py.)
For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:
7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf
Do we have alpaca-lora weights based on these new models?
Hi @gururise. Thanks for sharing the model! I guess these two lora weights are based on new llama models, am I right?
7B - https://huggingface.co/yahma/alpaca-7b-lora 13B - https://huggingface.co/yahma/alpaca-13b-lora
Hi @gururise. Thanks for sharing the model! I guess these two lora weights are based on new llama models, am I right?
7B - https://huggingface.co/yahma/alpaca-7b-lora 13B - https://huggingface.co/yahma/alpaca-13b-lora
Yes, they are both based on the new llama models.
I used alpaca-lora to fine-tune on top of openlm-research's open llama model. Now I'm getting lots of unk
in my output. What's weird is I swear it didn't do this earlier, perhaps I reinstalled the dependencies and that has affected it?
Can someone please help me understand what actually changed in the tokens? Which token ids changed and which is "correct"? and if anyone knows if openlm's model uses the "correct" tokenizer that would also help me a tonne. Appreciated.
There is still something wrong. I replace decapoda-research/llama-7b-hf with yahma/llama-7b-hf. And I find its tokenizer has no pad_token and pad_token_id. Its special tokens are as follows: <unk> 0, <bos> 1, <eos> 2. So what are the special tokens and their ids in original llama on earth? Do I have any misunderstandings?
You should use huggyllama
On Sun, Jun 18, 2023, 5:38 AM Kong Aobo @.***> wrote:
There is still something wrong. I replace decapoda-research/llama-7b-hf with yahma/llama-7b-hf. And I find its tokenizer has no pad_token and pad_token_id. Its special tokens are as follows: 0, 1, 2. So what are the special tokens and their ids in original llama on earth? Do I have any misunderstandings?
— Reply to this email directly, view it on GitHub https://github.com/tloen/alpaca-lora/issues/279#issuecomment-1596130394, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIQ4BNEYREGNLR6RSZGNB3XL3ZERANCNFSM6AAAAAAWUMVGUE . You are receiving this because you commented.Message ID: @.***>