qlora
qlora copied to clipboard
Adding new tokens causes performance and memory issues
Bug description:
- Adding new tokens to the tokenizer and corresponding embeddings causes the embeddings to be finetuned (
requires_grad = Trueafterresize_token_embeddings()is called). - The embeddings are updated incurring in higher memory footprint than reported in our paper
- The embeddings are not saved with the checkpoints and the reloaded checkpoint will have reduced performance
Note that this bug does not affect the results mentioned in the paper. In our research code, we were explicitly freezing the embeddings after initializing the model. A temporary fix involves the same solution of freezing the embeddings. This fix is not satisfactory for use cases where new tokens need to be added and corresponding representations tuned.
A more general fix would be adding LoRA layers to the embeddings or allowing only the new embeddings to be trained. The LoRA layer for embeddings might not work as well on the output projection layer (mapping back to the vocabulary before softmax).
Very good patch !
@artidoro is there a plan to implement fine tuning on new tokens in the vocabulary?
Thanks for the great repo!
Can you explain why "The LoRA layer for embeddings might not work as well on the output projection layer"? The problem I recently encountered may be related to this. Thanks a lot.
Need this feature +1