derpyplops

Results 3 issues of derpyplops

Solves NOT-291 This is quite a complex change, but this basically aims to train a reporter model per prompt, then evaluate it both on each individual prompt as well as...

Look at `tests/test_smoke_elicit.py` for a reference

enhancement
good first issue