openchat icon indicating copy to clipboard operation
openchat copied to clipboard

Question about `--per-sequence-loss`

Open Sanster opened this issue 1 year ago • 1 comments

In generate_dataset.py, there is a --per-sequence-loss arg, which used in conversation_template.py. This parameter further adjusts the weights based on the length of each response.

https://github.com/imoneoi/openchat/blob/30da91b20f11bf5aa268e84b6f5587caa37f510f/ochat/config/conversation_template.py#L104

I would like to know, when training the OpenChat series models, have you enabled this parameter? What is the impact of this parameter on the training results? Thanks

Sanster avatar Feb 18 '24 07:02 Sanster

When this parameter is enabled, losses are averaged on a per-sequence basis, otherwise on a per-token basis (same as HF trainer). It is disabled by default because it causes worse results in our experiments, making the model worse at longer responses.

imoneoi avatar Feb 23 '24 12:02 imoneoi