alpaca-lora What are the eos_token_id and bos_token

In generate.py, the bos_token_id=1 and eos_token_id=2, model.config.bos_token_id = 1 model.config.eos_token_id = 2

However, in finetune.py, the tokenizer is directly loaded from the official llama checkpoint, where bos_token_id=0 and eos_token_id=0. How to understand this discrepancy? Thank you!

Apr 05 '23 18:04 leekum2018

same question， if the fine-tune need same configuration?

Apr 06 '23 09:04 archwolf118

Same question, I finetuned an alpaca-lora using the author's code, and found it will generate a <unk> instead of <eos> at the end of response, which will result in some problems.

Apr 07 '23 07:04 HillZhang1999

This is a huge issue. The https://huggingface.co/decapoda-research/llama-Xb-hf HF models have bad/incorrect token ids mappings for bos/eos vs the original META llama. Now that lots of people are using it to generate models, the end result is bad.

Transformers head now fixes this issue but broke backward compact. You can use use_fast=False to use old LlamaTokenizer which is old code. Head transformer default to LlamaTokenizerFast.

Apr 07 '23 08:04 Qubitium

Everyone needs to check out transformer head and use the latest export to hf script on the original facebook models and use that as bases for future training using transformer[head]. Everyone needs to stop using decapoda models which will cause more and more issues the more you use it's broken tokenizer mapping for training.

https://github.com/huggingface/transformers/pull/22402

Apr 07 '23 08:04 Qubitium

For reference, the following is the token mapping generated by transformer[head] convert from llama weights:

{
  "_from_model_config": true,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 0,
  "transformers_version": "4.28.0.dev0"
}

If the model you downloaded/referencing has tokenizer that does not match the above, don't use it and just throw it away.

Apr 07 '23 09:04 Qubitium

@diegomontoya Thanks for your prompt reply, which addresses my confusion. And I have another question. according to finetune.py, each training sequence is appended with an EOS token during the preprocess. So I think the models trained on these data should tend to generate sentences ending with an [EOS]. However, I use the checkpoint provided in this repo to generate something, and I found the generated sentences end with [EOS] [BOS], instead of a single [EOS]. Is that normal?

Apr 07 '23 13:04 leekum2018

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Apr 08 '23 14:04 gururise

Everyone needs to check out transformer head and use the latest export to hf script on the original facebook models and use that as bases for future training using transformer[head]. Everyone needs to stop using decapoda models which will cause more and more issues the more you use it's broken tokenizer mapping for training.

huggingface/transformers#22402

is decapoda aware? they might be willing to update their models.

Apr 10 '23 17:04 ehartford

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

with 13B model, the size is 38G of the decapoda_research uploaded, while the model here is about 26G, could you tell me what's the difference between them,please?

Apr 11 '23 07:04 alisyzhu

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here: 7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

with 13B model, the size is 38G of the decapoda_research uploaded, while the model here is about 26G, could you tell me what's the difference between them,please?

Interesting observation. META AI original LLAMA 13B model weights are 26G in size.
I'm don't why the decapoda_research 13b model is 38G in size.
The yahma/llama-13b-hf was converted using latest transformer git, and matches the original 26G size for the model weights.

Apr 11 '23 16:04 gururise

@diegomontoya Thanks for your prompt reply, which addresses my confusion. And I have another question. according to finetune.py, each training sequence is appended with an EOS token during the preprocess. So I think the models trained on these data should tend to generate sentences ending with an [EOS]. However, I use the checkpoint provided in this repo to generate something, and I found the generated sentences end with [EOS] [BOS], instead of a single [EOS]. Is that normal?

Also got the same after finetuning on my end. Anybody found a workaround?

Apr 11 '23 20:04 louisoutin

For me doing:

model.config.pad_token_id = 0 = tokenizer.pad_token_id = 0  # same as unk token id
model.config.bos_token_id = tokenizer.bos_token_id = 1
model.config.eos_token_id = tokenizer.eos_token_id = 2

saving the transformer to json file using tokenizer.save_pretrained(path) and loading the tokenizer from this file helped in resolving the issue with unk token being generated instead of eos token. (But now i'm having the same issue as diegomontoya mentionned)

Apr 11 '23 20:04 louisoutin

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Thank you very much! This solved some very annoying inference bug in relation to the padding token of the tokenizer that would sometimes show up. If I changed the padding token it would just show up in another batch after a while. For people who might land on this page via Google, this is the error I used to (only sometimes) get:

../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [0,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [1,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [2,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [3,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [4,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [5,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [6,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [7,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [24,0,0], thread: [8,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
...
  File ".../lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Thanks to your uploaded models the issue somehow got fixed!

Apr 15 '23 12:04 NoahVl

Any updates to this? Are all things good now? Can we fix old models by changing the tokenizer config or?

Apr 17 '23 04:04 teknium1

@teknium1 You need to retrain on the fixed/updated base HF models. Anything trained using old transformer code on the decapoda models are bound to break. You can hack your way around the diff token ids but I wouldnt recommend it.

Apr 17 '23 05:04 Qubitium

@teknium1 You need to retrain on the fixed/updated base HF models. Anything trained using old transformer code on the decapoda models are bound to break. You can hack your way around the diff token ids but I wouldnt recommend it.

Is this mean that I have to download a new llama-hf model and retrain, or i can just use the old one, and use the newest transformer code with LlamaTokenizer?

Apr 17 '23 15:04 yzxyzh

@teknium1 You need to retrain on the fixed/updated base HF models. Anything trained using old transformer code on the decapoda models are bound to break. You can hack your way around the diff token ids but I wouldnt recommend it.

Is this mean that I have to download a new llama-hf model and retrain, or i can just use the old one, and use the newest transformer code with LlamaTokenizer?

I think it means either train on a llama model converted recently to HF format, or do it yourself with latest transformers. Unfortunately, the best fine tuned models rn are all based on old format. Only thing I can do atm is revert to older transformers commit to resolve

Apr 17 '23 16:04 teknium1

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Hi @gururise , is it possible to upload llama-30B and llama-65B as well? Thanks!

Apr 19 '23 13:04 HZQ950419

I would like to report all of Neko's tokenizers are current and match with https://huggingface.co/oobabooga/llama-tokenizer. Also if you want me to update stuff in the future just bug me here or on Neko.

Apr 25 '23 15:04 USBhost

@USBhost Your contributions are appreciated!

Apr 25 '23 16:04 ehartford

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from https://github.com/tloen/alpaca-lora/pull/364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?

May 02 '23 21:05 jploski

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here: 7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?

Try Neko or Elinas' repos

May 02 '23 21:05 teknium1

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here: 7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Unfortunately, unlike the decapoda-research/llama-7b-hf model the new yahma/llama-7b-hf does not load in a free Google Colab notebook (using Tesla T4 GPU). It just aborts with "^C" during the loading checkpoint shards stage (which can be demonstrated using test.py from #364). I suspect that it runs out of RAM because of the shard size and the Python process gets killed. Would it be possible for you to (re)publish this model split into several smaller shards (or is there some simple procedure to split it after downloading)?

Try Neko or Elinas' repos

elinas/llama-7b-hf-transformers-4.29 and Neko-Institute-of-Science/LLaMA-7B-HF both suffer from the same problem. They also both use the same two-big-shards config, which confirms my suspicion it is the cause (I can also see the RAM peaking and the process aborting when the 12.68 GB limit is hit; I'm talking of system RAM, not GPU RAM here).

So to sum it up, it would be nice to have a test configuration which can execute in the free Google Colab notebook - which I know is technically possible because decapoda-research/llama-7b-hf can be trained there (although the training produces wrong results).

(I also tried Kaggle, but there it fails because of the 20GB disk space limit.)

May 02 '23 21:05 jploski

So to sum it up, it would be nice to have a test configuration which can execute in the free Google Colab notebook - which I know is technically possible because decapoda-research/llama-7b-hf can be trained there (although the training produces wrong results).

I uploaded jploski/llama-7b-hf, which allows just this. It uses 34 checkpoint shards, but is otherwise identical to yahma/llama-7b-hf. (And the results of test.py from https://github.com/tloen/alpaca-lora/pull/364 are ok when the final LoRA weights from it are fed to generate.py.)

May 04 '23 12:05 jploski

For everyone's convenience, I've uploaded llama models converted with the latest transformer git head here:

7B - https://huggingface.co/yahma/llama-7b-hf 13B - https://huggingface.co/yahma/llama-13b-hf

Do we have alpaca-lora weights based on these new models?

May 06 '23 12:05 Opdoop

Hi @gururise. Thanks for sharing the model! I guess these two lora weights are based on new llama models, am I right?

7B - https://huggingface.co/yahma/alpaca-7b-lora 13B - https://huggingface.co/yahma/alpaca-13b-lora

May 06 '23 14:05 Opdoop

Hi @gururise. Thanks for sharing the model! I guess these two lora weights are based on new llama models, am I right?

7B - https://huggingface.co/yahma/alpaca-7b-lora 13B - https://huggingface.co/yahma/alpaca-13b-lora

Yes, they are both based on the new llama models.

May 12 '23 00:05 gururise

I used alpaca-lora to fine-tune on top of openlm-research's open llama model. Now I'm getting lots of unk in my output. What's weird is I swear it didn't do this earlier, perhaps I reinstalled the dependencies and that has affected it?

Can someone please help me understand what actually changed in the tokens? Which token ids changed and which is "correct"? and if anyone knows if openlm's model uses the "correct" tokenizer that would also help me a tonne. Appreciated.

May 31 '23 23:05 nevercast

There is still something wrong. I replace decapoda-research/llama-7b-hf with yahma/llama-7b-hf. And I find its tokenizer has no pad_token and pad_token_id. Its special tokens are as follows: <unk> 0, <bos> 1, <eos> 2. So what are the special tokens and their ids in original llama on earth? Do I have any misunderstandings？

Jun 18 '23 12:06 Kong-Aobo

You should use huggyllama

On Sun, Jun 18, 2023, 5:38 AM Kong Aobo @.***> wrote:

There is still something wrong. I replace decapoda-research/llama-7b-hf with yahma/llama-7b-hf. And I find its tokenizer has no pad_token and pad_token_id. Its special tokens are as follows: 0, 1, 2. So what are the special tokens and their ids in original llama on earth? Do I have any misunderstandings？

— Reply to this email directly, view it on GitHub https://github.com/tloen/alpaca-lora/issues/279#issuecomment-1596130394, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIQ4BNEYREGNLR6RSZGNB3XL3ZERANCNFSM6AAAAAAWUMVGUE . You are receiving this because you commented.Message ID: @.***>

Jun 18 '23 19:06 ehartford

alpaca-lora
alpaca-lora copied to clipboard

What are the eos_token_id and bos_token_id

alpaca-lora alpaca-lora copied to clipboard

What are the eos_token_id and bos_token_id

alpaca-lora
alpaca-lora copied to clipboard