qlora icon indicating copy to clipboard operation
qlora copied to clipboard

Adding new tokens causes performance and memory issues

Open artidoro opened this issue 2 years ago • 4 comments

Bug description:

  • Adding new tokens to the tokenizer and corresponding embeddings causes the embeddings to be finetuned (requires_grad = True after resize_token_embeddings() is called).
  • The embeddings are updated incurring in higher memory footprint than reported in our paper
  • The embeddings are not saved with the checkpoints and the reloaded checkpoint will have reduced performance

Note that this bug does not affect the results mentioned in the paper. In our research code, we were explicitly freezing the embeddings after initializing the model. A temporary fix involves the same solution of freezing the embeddings. This fix is not satisfactory for use cases where new tokens need to be added and corresponding representations tuned.

A more general fix would be adding LoRA layers to the embeddings or allowing only the new embeddings to be trained. The LoRA layer for embeddings might not work as well on the output projection layer (mapping back to the vocabulary before softmax).

artidoro avatar Jul 18 '23 11:07 artidoro

Very good patch !

apachemycat avatar Jul 20 '23 02:07 apachemycat

@artidoro is there a plan to implement fine tuning on new tokens in the vocabulary?

Thanks for the great repo!

victox5 avatar Jul 31 '23 07:07 victox5

Can you explain why "The LoRA layer for embeddings might not work as well on the output projection layer"? The problem I recently encountered may be related to this. Thanks a lot.

kongjiellx avatar Aug 04 '23 12:08 kongjiellx

Need this feature +1

chenjiasheng avatar Aug 15 '23 12:08 chenjiasheng