Matthew Rowe comments

Results 8 comments of


                                            Matthew Rowe

Subselect probes by input length

Do we have a discrete list of such targets that can have their input lengths capped?

bug: attempt["outputs"] log has fewer entries than in detector_results

I will pick this up.

bug: attempt["outputs"] log has fewer entries than in detector_results

Found root cause of this bug for `atkgen.Tox` probe and using `ToxicCommentModel` detector. Issue is that when one uses conversational LLM interactions to generate attack responses then only the assistant's...

probe: content compliance

This looks like a really worthwhile addition to the probe stack. Few questions on implementation: - Would this all be handed in a single probe, or per-task-type probes and detectors...

probe: token smuggling

I will pick this up.

Changed Attempt.outputs to return all assistant outputs

Sounds like this PR needs to be held until `attempts.output` is clearer, as this could be an edge case involving `atkgen`

Changed Attempt.outputs to return all assistant outputs

@leondz park this then until refactor and potentially delete? Feels like it might be a redundant change.

Changed Attempt.outputs to return all assistant outputs

@jmartin-tech let me know if any changes are needed here and happy to update the branch/PR.