h2o-llmstudio
h2o-llmstudio copied to clipboard
Length Consistency in LLM Outputs with Token Length based Penalty Loss Functions
Reopened the same PR #556 with correct local branch as suggested by @psinger.
Adding support for custom loss functions aimed at improving the length consistency in responses generated by fientuned LLMs. Idea is to make the output lengths of LLMs more reflective of the token lengths observed in the training data. I did several experiments using the loss functions, and noticed very low deviation in performance of models.
The loss functions implemented are:
LengthBasedTACE (Token Averaged Cross Entropy) LengthBasedSACE (Sample Averaged Cross Entropy)
Sharing some of the experiments I did using these losses to make a comparison with original Cross Entropy Loss:
Evaluation Results:
There could be some randomness involved in eval metric, but I found consistent decrease in LLMs inference time,specially the ones which scores bad & prone to generate bad responses.
| Model | Loss Function | Time Taken (min) | Eval Metric |
|---|---|---|---|
| llama13B-Chat | Token Avg CE Loss | 40.45 | 0.810 |
| llama13B-Chat | TokenLengthPenalty Token Avg | 38.62 | 0.802 |
| llama7B-Chat | Token Avg CE Loss | 12.50 | 0.7684 |
| llama7B-Chat | TokenLengthPenalty Token Avg | 12.12 | 0.7484 |
| Yi-6B-Chat | Token Avg CE Loss | 18.50 | 0.792 |
| Yi-6B-Chat | TokenLengthPenalty Token Avg | 15.44 | 0.785 |
| llama13B-Chat | Token Avg CE Loss | 78.20 | 0.728 |
| llama13B-Chat | TokenLengthPenalty Token Avg | 76.60 | 0.744 |
| Yi-6B-Chat | Token Avg CE Loss | 24.44 | 0.712 |
| Yi-6B-Chat | TokenLengthPenalty Token Avg | 24.20 | 0.704 |
These functions uses a length penalty coefficient, in my experiments I found 0.1 coefficient to be most stable one, therefore I kept it as default. This should help close #537
@Nischaydnk will you find time to continue working on this?
closing this for now - please re-open in future