TASK: improve docstrings completeness and quality across PyRIT modules
For people who want to contribute/help with this issue:
- Follow the general guidelines for contributing to PyRIT: https://azure.github.io/PyRIT/contributing/README.html
- Pick one or more modules from the list below and comment below which module(s) you're working on to avoid duplication of effort
- Remove the module(s) from
tool.ruff.lint.per-file-ignoresinpyproject.tomlso that docstring issues are no longer ignored there - Run
ruff checkorpre-commit run ruff-check --all-filesto see what issues to fix - Fix the issues
- Verify completion:
ruff checkshould report "All checks passed!" - Submit a PR referencing this issue
Here's a list of modules with the number of docstring issues in each:
- [x]
pyrit/analytics: 5 @riyosha - [x]
pyrit/auth: 12 @riyosha - [x]
pyrit/chat_message_normalizer: 13 @riyosha - [x]
pyrit/cli: 2 @riyosha - [x]
pyrit/common: 47 @paulinek13 - [x]
pyrit/datasets: 34 @paulinek13 - [ ]
pyrit/embedding: 5 @riyosha - [ ]
pyrit/exceptions: 26 @riyosha - [ ]
pyrit/executor: 92 - [ ]
pyrit/memory: 154 - [ ]
pyrit/models: 146 - [ ]
pyrit/prompt_converter: 387 - [ ]
pyrit/prompt_normalizer: 15 @paulinek13 - [ ]
pyrit/prompt_target: 209 @paulinek13 - [ ]
pyrit/scenarios: 9 - [ ]
pyrit/score: 117 - [ ]
pyrit/setup: 18
The issues are mostly related to missing docstrings for public classes/methods/functions as well as missing parameter/return documentation.
Original post from 07XI25 "SUGGESTION: enforce docstring consistency in PyRIT with Ruff":
Related issue: #653
In short, this issue proposes adding Ruff as a dev dependency to enforce documentation consistency in PyRIT.
Why
PyRIT currently lacks automated verification of docs consistency.
The proposed solution will automatically check for the docstring requirements defined in the project's style guide (.github/instructions/style-guide.instructions.md), including:
- Google-style docstring format
- Type information in parameter descriptions
- Documented return values
- Documented exceptions (Raises sections)
- Triple quotes for all docstrings
It will also ensure that, for example, new public classes/methods/functions actually are documented to prevent missing entries in the docs.
Of course, things like that can be manually reviewed during PR reviews, but this could be easily automated + this change makes it now easier in fixing #653, I think.
Proposed Solution
As stated above, the proposal is to add Ruff as a new dev dependency, configure it properly to match as much of the existing style as possible, and add ruff check pyrit/ to the .pre-commit-config.yaml.
https://pypi.org/project/ruff/ https://docs.astral.sh/ruff/
In future ruff could replace flake8/isort/pylint in PyRIT, but for now, I'm proposing to add it just for docs checks.
More info
ruff check pyrit\ takes about 200-600 ms, so it's super fast.
Currently, there are about 1700 "errors":
config I've used:
[tool.ruff.lint.pydocstyle]
convention = "google"
[tool.ruff.lint]
preview = true
select = [
"D", # https://docs.astral.sh/ruff/rules/#pydocstyle-d
"DOC" # https://docs.astral.sh/ruff/rules/#pydoclint-doc
]
ignore = [
"D100", # Missing docstring in public module
"D200", # One-line docstring should fit on one line
"D205", # 1 blank line required between summary line and description
"D212", # Multi-line docstring summary should start at the first line
"DOC502", # Raised exception is not explicitly raised: {id}
]
extend-select = [
"D204", # 1 blank line required after class docstring
"D213", # Multi-line docstring summary should start at the second line
"D401", # First line of docstring should be in imperative mood: "{first_line}"
"D404", # First word of the docstring should not be "This"
]
I'm all for this. It should be part of the pre commit config. We should at the same time evaluate which parts of the existing config are no longer useful. However, fixing 1800 issues is probably going to take a little while. I wonder if we can do it in batches?
Thanks for the feedback. We can certainly do this in batches.
Comment content:
I've divided the docstring errors into two main categories:
- formatting / whitespace / style -- easy to fix in one go across the repo
118 D202 [*] blank-line-after-function
105 D213 [*] multi-line-summary-second-line
44 D411 [*] no-blank-line-before-section
20 D410 [*] no-blank-line-after-section
3 D209 [*] new-line-after-last-paragraph
1 D214 [*] overindented-section
6 D301 [ ] escape-sequence-in-docstring
2 D300 [*] triple-single-quotes
3 D403 [*] first-word-uncapitalized
3 D416 [*] missing-section-name-colon
80 D415 [ ] missing-terminal-punctuation
- public API coverage / completeness / language style -- these will be addressed in separate PRs per module(s) to keep changes manageable
244 D102 [ ] undocumented-public-method
71 D107 [ ] undocumented-public-init
62 D101 [ ] undocumented-public-class
44 D417 [ ] undocumented-param
35 D103 [ ] undocumented-public-function
33 D104 [ ] undocumented-public-package
27 D105 [ ] undocumented-magic-method
2 D106 [ ] undocumented-public-nested-class
191 DOC501 [ ] docstring-missing-exception
167 DOC201 [ ] docstring-missing-returns
11 DOC202 [ ] docstring-extraneous-returns
1 DOC402 [ ] docstring-missing-yields
21 D404 [ ] docstring-starts-with-this
8 D418 [ ] overload-with-docstring
471 D401 [ ] non-imperative-mood
The plan is as follows:
-
I will start with a PR that adds Ruff as a dev dependency along with the initial doc-related config. The same PR will also fix all errors from the first group across the repo (as they are mostly "mechanical" changes and easy to fix in one go).
-
Subsequent PRs would address the second group of errors broken down by modules to keep the PRs manageable. These PRs would need to ensure that all Ruff docstring errors in the targeted module(s) are fixed (
ruff check pyrit/module1 pyrit/module2-> "All checks passed!") -
Once all the above is done, we can add Ruff to the precommit so that future PRs are blocked if they introduce docs issues.
As for the second point, I think we could track the overall progress of this effort. Maybe a checklist of modules to be done, so (new) contributors can pick/claim one (or more) and work on it.
Does this sound like a good approach?
Sounds fantastic. I think you can just add a "list" to the original post and I'll add a tag with "help wanted" so people can find this.
@romanlutz
Do you think we could ignore docstring issues in pyrit/ui and pyrit/auxiliary_attacks since they are not exposed in the public API Reference?
They should be in the API reference. It's mostly an oversight that they're not... You can ignore them for the moment (if it's possibly to exclude directories) and I'll get to that at some point.
Hi I'd like to contribute! I'm working on 2bmodules initially:
-
pyrit/analytics: 5
-
pyrit/auth: 12
I'm waiting for 1181 to be merged before I raise a PR. Thanks!
@riyosha I just merged #1181 so feel free to pull the changes and get started š
@riyosha I just merged #1181 so feel free to pull the changes and get started š
Thanks! Just raised a PR. I will also take up modules
- pyrit/chat_message_normalizer
- pyrit/cli
- pyrit/embedding
- pyrit/exceptions
if the PR looks okay!
Everything looks fantastic! I reviewed and merged the open PRs. Excellent work, thank you both for your contributions š Looking forward to move more of the validation to ruff.
Hi! Iām a student and interested in contributing to this issue. I can work on pyrit/scenarios: 9.
@eve1ne please go ahead!
@eve1ne please go ahead!
Thanks! pyrit/scenarios already had all checks passing, so I moved on to pyrit/setup (18 issues) and just opened a PR for those fixes. If everything looks good with that PR, I can take on the pyrit/executor (92 issues) next.
Sounds good! I would only recommend holding off on "prompt_target" until I'm done with #1194 because a lot will change there.
Hi, i'm a student and I would like to help with pyrit/memory. I can start working on the docstrings for this module.
Hi! I just raised a PR for pyrit/executor, if everything looks good with that I can move on to pyrit/models.
It looks great! Feel free to move on to the next one.