gherkin Format generated Python parser module

trafficstars

🤔 What's changed?

Updated parser generation code with Python formatting as per style guidelines
Invoked no-operation Token.detach method in generated parser code
Migrated Python formatter from black to ruff
Applied refactorings to generated parser code
- Reorganised imports - I001 (ruff check gherkin/parser.py --select=I001)
- Dropped or False logic from conditionals - RET505 (ruff check gherkin/parser.py --select=RET505)
- Replace series of or statements with any call
- Drop else statement and nesting proceeding by conditionals that would return from the enclosing function

⚡️ What's your motivation?

Ensures entire Python implementation is formatted consistently - building on #286
- Improves code review as pull requests less-susceptible to stylistic changes
- Simplifies configuration - removes exclusion of incompliant modules
- Improves readability
  - Bulk of change reduces mix of 4 and 8 space indentation within match_token_at_'x' methods to just 4
    - Was difficult to read and was most-likely a workaround in the generator code to satisfy generating code for two cases: to run inside conditionals (with 4 spaces) and outside conditionals (8 spaces) without having to handle within the generator; a SyntaxError would be raised if standardised to 4 spaces without handling (as per this pull request)
Migrating formatting to ruff has a number of advantages - including performance, and enabling standardising to a single toolset for linting and formatting
- ruff is monorepo-friendly, with hierarchical and cascading configuration - which is ideal for this repository: allowing configuration to be stored within the python directory while allowing pre-commit, IDEs and commands run from the root of the repository
- Aligns formatting development experience with pytest-bdd (pytest-dev/pytest-bdd#758)
General refactorings to improve code quality and simplify code generation
- or False is redundant - assume added as a safeguard where the proceeding args are not present (defaults to just ‘False’ in that case) pots generation - however the unit tests protect against this, failing if the args aren’t generated
- Easier to generate any calls against a comma-separate iterable compared to a series of or statements where it can't be applied on every line
- superflous else is redundant (superflous-else-return - RET505)
- Generated parser code currently contains a useless-expression (B018) token.detach - updated code invokes appropriately though this doesn't actually alter behaviour (being a no-operation method) though conforms to intention and other parser implementation

🏷️ What kind of change is this?

:bank: Refactoring/debt/DX (improvement to code design, tooling, etc. without changing behaviour)

♻️ Anything particular you want feedback on?

Thoughts whether a CHANGELOG entry is warranted?
To evaluate changes are formatted correctly:

Open a codespace (or locally alternatively):

Regenerate all parser implementations:
```
docker build --tag berp-env .
docker run --rm --interactive --tty --volume ".:/app" berp-env
make clean-generate
make generate
```
From a separate terminal window, install the pre-commit hook and validate all Python code is formatted (including the generated parser code):
```
pip install pre-commit
pre-commit install
pre-commit run --all-files ruff-format
```
Run Python unit and acceptance tests to validate no regression in the generated parser implementation:
```
pip install pytest
pytest
cd python/
make acceptance
```
Have skipped formatting on state_comment string
- Challenging to generate formatted Python multiline string from C# - consider exclusion pragmatic
Intend to update GitHub Actions workflow in a follow up pull request to validate, or fully assess use of pre-commit-ci with this polyglot repo

📋 Checklist:

[x] I agree to respect and uphold the Cucumber Community Code of Conduct
[x] My change requires a change to the documentation.
- [x] I have updated the documentation accordingly.
[x] Users should know about my change
- [ ] I have added an entry to the "Unreleased" section of the CHANGELOG, linking to this pull request.

Jan 04 '25 22:01 kieran-ryan

I’m 👎 for this change:

self.ast_builder = ast_builder if ast_builder is not None else AstBuilder() -> self.ast_builder = ast_builder or AstBuilder()

the reason is that you should be able to subclass AstParser and override __bool__ to return False, and the above code should still use your object.

Jan 05 '25 12:01 youtux

I’m 👎 for this change:

self.ast_builder = ast_builder if ast_builder is not None else AstBuilder() -> self.ast_builder = ast_builder or AstBuilder()

the reason is that you should be able to subclass AstParser and override __bool__ to return False, and the above code should still use your object.

Makes sense - applied! Had been unsure on the same for a somewhat related reason - being too implicit by assessing "truthiness" rather than the explicit type declarations - which could become a consideration with changes down the line should users begin passing additional values. Accustomed to using in a local context without this consideration. Interesting, thank you.

Jan 05 '25 22:01 kieran-ryan

Given this is a big refactor how far down the line is it. Is it close to being merge-worthy? @kieran-ryan as we're doing a lot of other big changes in other flavours so would be good to get this in if we can

Feb 08 '25 15:02 luke-hill

I've fixed the remarks around not exiting from the loop early. The implementation is a bit awkward as python doesn't have a do-while construct, but this should work and seems to satisfy ruff.

Jun 30 '25 11:06 mpkorstanje

gherkin gherkin copied to clipboard

Format generated Python parser module

🤔 What's changed?

⚡️ What's your motivation?

🏷️ What kind of change is this?

♻️ Anything particular you want feedback on?

📋 Checklist:

gherkin
gherkin copied to clipboard