Matthew Rowe

Results 8 comments of Matthew Rowe

Do we have a discrete list of such targets that can have their input lengths capped?

Found root cause of this bug for `atkgen.Tox` probe and using `ToxicCommentModel` detector. Issue is that when one uses conversational LLM interactions to generate attack responses then only the assistant's...

This looks like a really worthwhile addition to the probe stack. Few questions on implementation: - Would this all be handed in a single probe, or per-task-type probes and detectors...

I will pick this up.

Sounds like this PR needs to be held until `attempts.output` is clearer, as this could be an edge case involving `atkgen`

@leondz park this then until refactor and potentially delete? Feels like it might be a redundant change.

@jmartin-tech let me know if any changes are needed here and happy to update the branch/PR.