h2o-llmstudio Generate settings and MoE Loss

Generate settings and MoE Loss

Open psinger opened this issue 1 year ago • 1 comments

This PR addresses the following:

New max_time setting for generation allowing to specifiy a max second time per generation. Closes https://github.com/h2oai/h2o-llmstudio/issues/568

New prompt_lookup_num_tokens as discussed in https://twitter.com/joao_gante/status/1747322413006643259 Will likely only help for summarization and QA tasks - default chat inference even got slower by using it But let's keep it as a setting one can try

Adds a new loss function MoECrossEntropy that can be used for MoE models like Mixtral. Follows the implementation of https://arxiv.org/pdf/2101.03961.pdf as implemented in https://github.com/huggingface/transformers/blob/v4.37.2/src/transformers/models/mixtral/modeling_mixtral.py#L77

First experiments with Mixtral and LoRA did not show a big impact. The scale of the loss is in general pretty much similar to the regular cross entropy, so the default additive term might be too low, but will keep recommended settings from paper and HF for now as default.

Needs more experimentation to better understand impact. Closes https://github.com/h2oai/h2o-llmstudio/issues/607

Feb 07 '24 08:02 psinger

Maybe hold with the review a bit, I am exploring the loss a bit more right now. Probably with LoRA it will not even properly train the gate (which can be good).

Feb 07 '24 16:02 psinger

closing this for now

May 13 '24 10:05 psinger

h2o-llmstudio h2o-llmstudio copied to clipboard

Generate settings and MoE Loss

h2o-llmstudio
h2o-llmstudio copied to clipboard