llm-foundry icon indicating copy to clipboard operation
llm-foundry copied to clipboard

Configure eval to give 'loss/eval' that is analgous to 'loss/train'

Open tginart opened this issue 2 years ago • 4 comments

When I run with an eval set, I only get metrics/eval.

I am wondering if there is a way to configure llm-foundry via yaml to also compute loss/eval in the same way that it computes loss/train.

tginart avatar May 30 '23 02:05 tginart

Hi @tginart yes we can definitely add support for that.

Basically I think we just need to copy this code over to eval.py and test it out.

If you'd like to open a PR I'd be happy to review it! Otherwise I'll add it to my list and get to it soon.

abhi-mosaic avatar May 31 '23 01:05 abhi-mosaic

Hi @abhi-mosaic.

I am referring to the metrics from the composer Evaluator in the train.py (https://github.com/mosaicml/llm-foundry/blob/3c66b1c5df668e0684548fef30d00669df64636c/scripts/train/train.py#LL158C1-L162C79)

So not sure if we are talking about the same thing?

I'm still running through train.py not eval.py.

tginart avatar May 31 '23 04:05 tginart

Screenshot 2023-06-01 at 2 15 55 PM

Hi @tginart, is this what you're looking for? ^

The eval metrics such as metrics/eval/LanguageCrossEntropyLoss are computed every eval_interval, which I think is defaulted to 500ba. You could certainly reduce this interval but keep in mind that eval metrics are an average over an entire eval dataloader, as opposed to the loss/train value which is the live train loss of a single batch. So it would slow down training a lot.

If you want fine grained eval but not use the whole eval_dataloader, you can also use something like eval_subset_num_batches: 10 which would only run over the first 10 batches of the eval dataloader.

Re. the WandB chart naming, I think you could change the line you linked with label='eval' to something like label='eval/loss', but I believe Composer always suffixes the torchmetric name at the end, so you would end up with a chart titlemetrics/eval/loss/LanguageCrossEntropyLoss

abhi-mosaic avatar Jun 01 '23 21:06 abhi-mosaic

Hi @abhi-mosaic, thank you. That is what I am looking for but I was wondering if it was possible to compute loss/train for every example in the eval set & then get an avg. I would only want to do this at eval time. It seems like the way to do this is to actually extend a metric for composer and make sure it gets passed to here ?

tginart avatar Jun 02 '23 18:06 tginart

Hi @tginart, yes. If you want to compute a custom metric (e.g. one that matches the loss), you can create the metric and pass it along in the place you described. I know you asked this question a while ago, but has it been resolved?

dakinggg avatar Sep 07 '23 02:09 dakinggg

Closing due to inactivity. Please feel free to reopen/or open a new issue if this is not resolved.

dakinggg avatar Sep 15 '23 22:09 dakinggg