Thomas Broadley

Results 109 comments of Thomas Broadley

I confess that I wouldn't know how to update my IDE to use these JSON Schemas. How about documenting that?

You're welcome Nate! https://github.com/METR/vivaria/pull/389 should make it easier to test intermediate scoring, by changing Vivaria to always call `TaskFamily#intermediate_score` at least twice per run.

For now, maybe we just don't stop or teardown agent containers + aux VMs for runs if the scoring function returns None. In the future, we could add some logic...

Or move manual scoring from Airtable to MP4, and teardown the resources once manual scoring is complete.

There are a couple of ways we collect OOM errors: 1. A command that Vivaria is running gets OOM-killed (in the case of scoring, I imagine this causes Vivaria to...

You'd need a Vivaria instance with Git support. Here's how to set that up: https://vivaria.metr.org/how-tos/git-support/ You might not have the right access to Git repos to set that up on...

Yes, `vite.config.js` is the place. It takes env variable values and injects them into the UI. `ui/src/util/auth0_client.ts` isn't METR-specific. Yeah I don't think we have great documentation for env variables...

@sjawhar What do you think?

Oh, I could imagine this being easier to review if it were broken into two PRs: 1. All of the above changes, but don't move code from DriverImpl to Driver....

15 minutes thinking about this: ## Desiderata - It's easy for users to show and hide different kinds of trace entries, in a more granular way than we currently support...