Alex Bäuerle
Alex Bäuerle
Hmm thank you for testing. Will debug.
I think fixed. Not sure if I caught all edge cases.
> @Sparkier I can't actually get this to work 🤔 > > It seems to always send an empty LLM_API_KEY to the websocket. As @amanape points out, it also doesn't...
Correction @guneetsk99, I've started with the handcrafted tools [here](https://github.com/OpenDevin/OpenDevin/pull/682).
Awesome, really like the design. One thing I immediately noticed from the screenshots is that there are some issues with the tab design: 1. it is very hard to recognize...
@haileyschoelkopf I've started implementing tests for the Zeno upload functionality. I'm not sure where to put the test data (as we need more than just the metric results). Furthermore, you'll...
Regarding tests, we could run a test on our end. E.g., we have integration tests set up where we have a project created on push. Would not alert you if...
> Any other changes on our end to beautify Zeno projects created? We've seen some patterns in useful metadata recently. For example, the model output length in freeform answers is...
For tests, see https://github.com/EleutherAI/lm-evaluation-harness/pull/1221
For additional metadata, see https://github.com/EleutherAI/lm-evaluation-harness/pull/1222