weave Export evaluation results

Hi, looking through the docs and code I cannot find a way to export the evaluation results. I'd like to be able to do something like

import weave

sample = ... # input data
evaluator = ... # eval function
model = ... # weave Model

# define eval
evaluation = weave.Evaluation(
    dataset=sample,
    scorers=[evaluator],
    trials=1,

)

# run eval
output = await evaluation.evaluate(model)

# how can I do something like this?

evaluated_rows = evaluation.results # contains the traces with predictions and evaluator output

In other words, I would like to download this table

I feel like this should be a standard feature?

May 06 '24 13:05 nthomsencph

Hi @nthomsencph. Yes we really need an export button. There was some discussion within the team on whether or not to hold off on an export button that would act more intelligently(deal with expanded refs, and better match the UI).

I will put up a pr that adds a export button that uses mui data grid apis to export the table to csv. Note this will not work with expanded refs (though that functionality should come relatively soon)

https://github.com/wandb/weave/pull/1606

May 07 '24 00:05 jwlee64

Thanks for the quick reply. Looking forward to this feature.

A button in the UI would be great but I was looking for a programmatic way of exporting the table.

May 07 '24 03:05 nthomsencph

Hi @nthomsencph! I just merged the Export to CSV pr which adds a button in the UI, which should hopefully help you make some headway. (Note that this export is not the final form of this feature, we plan on moving the export to the server to make it faster and fill in ref information)

I can create an internal ticket for exporting the table programmatically. https://wandb.atlassian.net/browse/WB-18680

If possible could you please specify what you intend to do with the data? Understanding the purpose would help us build for that use case more directly.

May 07 '24 21:05 jwlee64

Hi @jwlee64 - Thanks for the swift reply and action. I will check ii out today.

Please submit an internal ticket as well.

The reason for the programmatic export is that we run multiple experiments with different LLMs in our research. We want the freedom to fetch the table from each evaluation such that we can derive descriptive statistics and do e.g., hypothesis tests and correlation analyses.

May 08 '24 05:05 nthomsencph

weave weave copied to clipboard

Export evaluation results

weave
weave copied to clipboard