P.D. Reiter

Results 13 issues of P.D. Reiter

based on @ChrisTimperley 's feedback issue #300 - Enables a new CLI interface (`darjeeling evaluate`) - tested, but seeing some issues with test timeouts (Currently, `ResourceUsageTracker(limits=None)`, so I may need...

- tested with and without heldout-test content - IMPACT: Changes the TestOutcome, s.t. these two cases: (without heldout tests) and (failing heldout tests) are indiscernible

feature

During assessment, need to extend evaluation infrastructure to support heldout-tests.

feature