lorax icon indicating copy to clipboard operation
lorax copied to clipboard

Supporting LmHead and Embedding Layers for Adapters

Open magdyksaleh opened this issue 1 year ago • 2 comments

System Info

Doesn't work if you make changes to the vocab

Information

  • [ ] Docker
  • [ ] The CLI directly

Tasks

  • [ ] An officially supported command
  • [ ] My own modifications

Reproduction

to come

Expected behavior

to come

magdyksaleh avatar Feb 08 '24 20:02 magdyksaleh

Context: https://stackoverflow.com/questions/72775559/resize-token-embeddings-on-the-a-pertrained-model-with-different-embedding-size

tgaddair avatar Feb 17 '24 00:02 tgaddair

Here's a code block also demonstrating what might be needed:

>>> from transformers import AutoTokenizer, AutoModelForCausalLM
>>> model = AutoModelForCausalLM.from_pretrained("yujiepan/llama-2-tiny-random")
>>> tokenizer = AutoTokenizer.from_pretrained("yujiepan/llama-2-tiny-random")
>>> model.get_input_embeddings()
Embedding(32000, 8, padding_idx=0)
>>> len(tokenizer.vocab)
32000
>>> tokenizer.add_tokens(['|INST|'])
1
>>> len(tokenizer.vocab)
32001
>>> model.resize_token_embeddings(len(tokenizer.vocab))
Embedding(32001, 8)
>>> model.get_input_embeddings().padding_idx = 0 # Save before and set again after resizing
>>> model.get_input_embeddings()
Embedding(32001, 8, padding_idx=0)

arnavgarg1 avatar Feb 20 '24 18:02 arnavgarg1