PyRIT
PyRIT copied to clipboard
The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their...
## Description Added scorers to support the crescendo attack strategy. This includes a self-ask conversation objective score, which assesses whether a particular response is indicative of whether the objective was...
## Description ## Tests and Documentation
## Description Implement the first part of the crescendo attack strategy. So far adding the relevant scorers and prompt templates. ## Tests and Documentation Added unit tests and scoring notebook...
## Description Adding the code for Crescendomation to PyRIT ## Tests and Documentation
Hi, We have just improved the quality of the WMDP Dataset last week. You can refer to the updated dataset on the [hugging face page ](https://huggingface.co/datasets/cais/wmdp)or on our [github repo](https://github.com/centerforaisafety/wmdp)....
#### Is your feature request related to a problem? Please describe. Many-shot jailbreaking as described in https://www.anthropic.com/research/many-shot-jailbreaking is not yet available in PyRIT. #### Describe the solution you'd like From...
#### Is your feature request related to a problem? Please describe. The following paper describes a technique called Tree of Attacks with Pruning (TAP): https://arxiv.org/pdf/2312.02119.pdf PyRIT doesn't currently support it....
#### Is your feature request related to a problem? Please describe. The suffix attack from https://arxiv.org/pdf/2307.15043.pdf is not yet supported by PyRIT. More examples are on their webpage https://llm-attacks.org/ ####...
#### Is your feature request related to a problem? Please describe. The fuzzing technique described in https://arxiv.org/pdf/2309.10253.pdf is not yet supported in PyRIT. #### Describe the solution you'd like The...
#### Is your feature request related to a problem? Please describe. When enabling `verbose` mode of an orchestrator (e.g., `EndTokenRedTeamingOrchestrator`) the printed logs are sometimes tricky to read as it's...