Thomas Broadley

Results 109 comments of Thomas Broadley

A non-exhaustive list of critical code paths: - Killing runs using the "Kill" button - Probably want an E2E test for this - Usage limits computation - Killing runs that...

METR/mp4#1226 adds an E2E test for killing runs using the "Kill" button and improves the existing E2E test for killing runs that have passed their total seconds usage limit.

Thinking about tests for generations: - It'd be good to have a test that checks that we can call `/generate` with a certain known-to-work generation request. All the way back...

Now that `mp4 ssh` exports environment variables, it should be possible to SSH into an existing agent container and do `python main.py`.

🤷 People learn in different ways, seems good to have both modalities.

To what extent METR will support other users of VIvaria, e.g. how quickly we'll respond to GitHub issues and emails, what kinds of PRs we're interested in receiving.

We do have some code connecting the DB structure to TypeScript objects -- please see `tables.ts`.

Oh man, it's been two months since this PR was tested, right? I'm worried that, in the meantime, we've added or changed some task to conflict with this PR. Right...

Note that in this repo `task-standard/examples` only contains the `count_odds` task. Maybe that means this README isn't necessary? I do feel like we should move the example task somewhere else....

I've put off reviewing this one because, on the face of it, I didn't think a function on Driver should take a path to a file in the container. Instead,...