Matthew Rowe
Matthew Rowe
Do we have a discrete list of such targets that can have their input lengths capped?
I will pick this up.
Found root cause of this bug for `atkgen.Tox` probe and using `ToxicCommentModel` detector. Issue is that when one uses conversational LLM interactions to generate attack responses then only the assistant's...
This looks like a really worthwhile addition to the probe stack. Few questions on implementation: - Would this all be handed in a single probe, or per-task-type probes and detectors...
I will pick this up.
Sounds like this PR needs to be held until `attempts.output` is clearer, as this could be an edge case involving `atkgen`
@leondz park this then until refactor and potentially delete? Feels like it might be a redundant change.
@jmartin-tech let me know if any changes are needed here and happy to update the branch/PR.