weave
weave copied to clipboard
Export evaluation results
Hi, looking through the docs and code I cannot find a way to export the evaluation results. I'd like to be able to do something like
import weave
sample = ... # input data
evaluator = ... # eval function
model = ... # weave Model
# define eval
evaluation = weave.Evaluation(
dataset=sample,
scorers=[evaluator],
trials=1,
)
# run eval
output = await evaluation.evaluate(model)
# how can I do something like this?
evaluated_rows = evaluation.results # contains the traces with predictions and evaluator output
In other words, I would like to download this table
I feel like this should be a standard feature?
Hi @nthomsencph. Yes we really need an export button. There was some discussion within the team on whether or not to hold off on an export button that would act more intelligently(deal with expanded refs, and better match the UI).
I will put up a pr that adds a export button that uses mui data grid apis to export the table to csv. Note this will not work with expanded refs (though that functionality should come relatively soon)
https://github.com/wandb/weave/pull/1606
Thanks for the quick reply. Looking forward to this feature.
A button in the UI would be great but I was looking for a programmatic way of exporting the table.
Hi @nthomsencph! I just merged the Export to CSV pr which adds a button in the UI, which should hopefully help you make some headway. (Note that this export is not the final form of this feature, we plan on moving the export to the server to make it faster and fill in ref information)
I can create an internal ticket for exporting the table programmatically. https://wandb.atlassian.net/browse/WB-18680
If possible could you please specify what you intend to do with the data? Understanding the purpose would help us build for that use case more directly.
Hi @jwlee64 - Thanks for the swift reply and action. I will check ii out today.
Please submit an internal ticket as well.
The reason for the programmatic export is that we run multiple experiments with different LLMs in our research. We want the freedom to fetch the table from each evaluation such that we can derive descriptive statistics and do e.g., hypothesis tests and correlation analyses.