Entrypoints and new gandlf cli command
Fixes #610.
Proposed Changes
- [x] [base PR] replaced
gandlf_*scripts in favor of entrypoints. Thus, user is still able to rungandlf_constructCSV, but now it can be done from any working directory. - [x] [base PR] Add unified
gandlfcli tool in order to replace separategandlf_*entrypoints. Based on click.gandlf_*entrypoints are marked as deprecated. Different scripts are converted to subcommands:gandlf update-version -ov 0.18 -nv 0.19instead ofpython ./gandlf_updateVersion -ov 0.18 -nv 0.19 - [x] [base PR] Added a couple of common args for all subcommands:
gandlf --version, no need for subcommandgandlf --loglevel DEBUG|INFO|WARNING|ERROR run/path-miner/anonymizer/...---loglevelparam is common for all subcommands.
- [x] [sub-PRs] Unifies cli arguments for different scripts / subcommands:
- [x] naming style: all params and subcommands are renamed to
--snake-caseas this is a standard way of naming in CLI tools (docker, aws cli, [gunicorn] (https://docs.gunicorn.org/en/stable/run.html#commonly-used-arguments)). Example:gandlf collect-stats --model-dir model/ --output-dir output/ - [x] typing & validation:
- [x] all boolean args are converted to flags (so, you don't need to provide
True/False; instead ofgandlf_run --train True --reset Trueusegandlf run --train --reset. Instead ofgandlf_run --train False --reset Falseusegandlf run --infer(no -rt passed at all). - [x] file / dir args are automatically validated by
click: if arg may point to file, if arg may point to dir, if arg path must exist - [Won't be done] ~ parameter names for input/output are unified (originally some scripts wait for
--input, other for--input-pathor--input-file; same for output / config parameters)~
- [x] all boolean args are converted to flags (so, you don't need to provide
- [x] naming style: all params and subcommands are renamed to
- [x] [sub-PRs] added tests for both new and old cli entrypoints.
- [x] [sub-PRs] Fixed all docs & other non-code entities to replace old scripts to new CLI tool. Ex:
python ./gandlf_collectStats->gandlf collect-stats
As all the changes altogether with new CLI tool creation is very massive (a lot of tests + code duplication by supporting both old way argparse + new way click), to simplify review process it is split onto:
- base PR (this one) includes
- moving
gandlf_*scripts to independent entrypoints - introducing a new
gandlfentrypoint as a stub (supports only common args like--versionand--loglevel) - adding common test utilities for all the future CLI subcommands
- moving
- separate sub-PRs: one PR for every subcommand from its branch to the current
entrypoint_base. Every PR include- adding a new subcommand
- its tests
- doc fixes.
Every PR tries to keep existing entrypoint logic as much as possible (even when it's obvious for me it can be improved). Thus, old entrypoints
gandlf_collectStatswould work as usual, while new cli tool behavior may differ a bit.
Checklist
This PR should be reviewed separately from sub-PRs. However, IMO, we should merge it only after all sub-PRs are reviewed and merged into this one.
Every point except sub-PRs should be implemented in this PR.
- [x] Move scripts to entrypoints. NB: breaking change. No scripts exist in root folder anymore; if package is just copied, not installed with
setuptools/pip/condaetc, no entrypoints would be available. - [x] Add root
gandlfcli tool - [ ] sub-PRs are approved. Every PR has to have tests implemented + doc updated. NB: partially breaking change: old cli commands may differ from new ones as params are renamed.
- [x]
gandlf anonymizer: https://github.com/VukW/GaNDLF/pull/1 - [x]
gandlf collect-stats: https://github.com/VukW/GaNDLF/pull/2 - [x]
gandlf config-generator: https://github.com/VukW/GaNDLF/pull/3 - [x]
gandlf construct-csv: https://github.com/VukW/GaNDLF/pull/4 - [x]
gandlf debug-info: https://github.com/VukW/GaNDLF/pull/5 - [ ]
gandlf deploy: https://github.com/VukW/GaNDLF/pull/6 - [ ]
gandlf generate-metrics: https://github.com/VukW/GaNDLF/pull/7 - [ ]
gandlf optimize-model: https://github.com/VukW/GaNDLF/pull/8 - [ ]
gandlf patch-miner: https://github.com/VukW/GaNDLF/pull/9 - [ ]
gandlf preprocess: https://github.com/VukW/GaNDLF/pull/10 - [ ]
gandlf recover-config: https://github.com/VukW/GaNDLF/pull/11 - [ ]
gandlf run: https://github.com/VukW/GaNDLF/pull/12 - [ ]
gandlf update-version: https://github.com/VukW/GaNDLF/pull/13 - [ ]
gandlf verify-install: https://github.com/VukW/GaNDLF/pull/14 - [ ]
gandlf split-csv: https://github.com/VukW/GaNDLF/pull/15
- [x]
- [x] new tests are added to github workflow
- [x] Test docker images: test they are building and run successfully. UPD: is done by regular CI tests
- [x] ~
.zipfiles used for testing contain some mlcubes with predefined entrypoints likepython ./gandlf_run. These MLCubes should be updated => all.zipshould be recreated.~ UPD: no need to update testing data, it wasn't changed. - [x]
CONTRIBUTINGguide has been followed. - [x] PR is based on the current GaNDLF master .
- [x] Function/class source code documentation added/updated (ensure
typingis used to provide type hints, including and not limited to usingOptionalif a variable has a pre-defined value). - [x] Code has been blacked for style consistency and linting.
- ~[Not applicable] If applicable, version information has been updated in GANDLF/version.py.~
- ~[Not Applicable] If adding a git submodule, add to list of exceptions for black styling in pyproject.toml file.~
- [x] Usage documentation has been updated to use old-style entrypoints (
gandlf_run ...) instead of python scripts (python ./gandlf_run ...). - [x] ~[After merging sub-PRs]~[Is actually fixed in sub-PRs] Usage documentation has been updated to use new-style CLI tool (
gandlf run) instead of old-style entrypoints (gandlf_run ...). NB: all documentation concerning specific command (say, list of args forgandlf run) must be updated in sub-PR, not here. - [x] [No tests added or modified]. *There were no tests for
gandlf_*scripts, so nothing to modify; cli tool itself (gandlf) is not covered by tests, thus for now test cov would lower; however, all subcommands (gandlf runas well asgandlf_run) must be covered by tests in the appropriate sub-PR. - ~[Not Applicable] If customized dependency installation is required (i.e., a separate
pip installstep is needed for PR to be functional), please ensure it is reflected in all the files that control the CI, namely: python-test.yml, and all docker files [1,2,3].~
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅
Codecov Report
Attention: Patch coverage is 47.67932% with 124 lines in your changes are missing coverage. Please review.
:exclamation: No coverage uploaded for pull request base (
new-apis_v0.1.0-dev@0c878af). Click here to learn what that means.
:exclamation: Current head 81cd10c differs from pull request most recent head 09b052b. Consider uploading reports for the commit 09b052b to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## new-apis_v0.1.0-dev #818 +/- ##
======================================================
Coverage ? 92.11%
======================================================
Files ? 142
Lines ? 8904
Branches ? 0
======================================================
Hits ? 8202
Misses ? 702
Partials ? 0
| Flag | Coverage Δ | |
|---|---|---|
| unittests | 92.11% <47.67%> (?) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
👍🏽
Marking as draft until ready for review. 😄
As updateVersion actually is not meant to be used by real users, I removed it from cli tools. Instead, I brought this script back to root directory and renamed.
Now, to use the script, one needs to run python update_version.py -ov 0.19 -nv 0.20 (running scripts through python is more straightforward and explicit way)
As
updateVersionactually is not meant to be used by real users, I removed it from cli tools. Instead, I brought this script back to root directory and renamed. Now, to use the script, one needs to runpython update_version.py -ov 0.19 -nv 0.20(running scripts through python is more straightforward and explicit way)
Sounds good. This needs to be updated in the developer wiki when merged.
Failed test: test coverage reduced
Test coverage dropped significantly: from 95.01% to 90.20%. For better understanding, here is a report from the previous PR to the master branch vs current PR report.
However, the reason is that in master branch gandlf_* scripts were omitted; now the code is moved to GANDLF/entrypoints/*.py (and ./update_version.py), thus, included to the coverage. There is no tests in the master branch for these scripts.
Sub-PRs add some tests on existing entrypoints code, however, it covers only arg parsing, and not additional logic (like plotting logic in collect_stats.py)
Failed test: Codacy static code analysis
The new issues that codacy shows belong to scripts that were moved from gandlf_* to GANDLF/entrypoint/*.py. As this is actually an old code, I don't want to make changes here; Instead, I believe, we can fix codacy issues in according sub-PRs (say, issues located in GANDLF/entrypoints/construct_csv.py should be solved in constuct-csv sub-PR). Another option is to address these "new" issues later, as a separate PRs after this one is merged.
@sarthakpati As sub-PRs are going to be merged here later, it would be great if you can review the content of this PR before "contamination":)
Since this is a fairly large and substantial PR, I am going to request @szmazurek to review from along with me.
Also, tagging GaNDLF before this gets merged (we will have another after this gets merged and move to version 0.1.0, since this is such a major change). 😄
@sarthakpati @szmazurek Folks, thank you for your notes! As this PR looks like good enough (except of the question with duplicating entrypoints list in testing), may I ask you to start review on a couple of sub-PRs? It would also give you a better understanding how implementation & testing of specifc subcommands relates to this base PR. I'd propose to start with these ones:
- anonymizer: https://github.com/VukW/GaNDLF/pull/1
- collect stats: https://github.com/VukW/GaNDLF/pull/2
- config generator: https://github.com/VukW/GaNDLF/pull/3
All sub-PRs are structured in the same way.
Implementation in GANDLF/entrypoints/XXX.py
- added a click function that would parse CLI command args in
gandlf subcommand --foo bar. - as much logics as possible is moved from
mainfunction to some function-wrapper. This wrapper makes a common checks or transformations needed over input args params. At the end this wrapper calls a main, real function in GaNDLF code that would actually do the job. - all transformations / checks that relate to old-way
argparsescript is left inmain. Example: check that given path exists (this is not necessary for click subcommand as click validates it internally). - for click subcommand params are renamed to snake-case as it is default way for unix cli tools.
Overall, in every sub-PR, "old" version of running gandlf_* entrypoint would behaves exactly in the same way as it works right now. "New" versions via gandlf subcomand ... may change its naming, but should finally execute exactly the same real gandlf code, with the same parsed args.
NB: as I tried to keep exactly the same behavior for old entrypoints, I didn't fix any issues in old code even if it's obvious; the goal was to make a new CLI behavior consistent with old one, not improve the subcommands logic. However, we can discuss it in in-place: if you believe it's safe to make any fixes & improvements to old code, we are welcome to do it.
tests in testing/entrypoints/test_XXX.py
Tests define a list of old-way shell commands and a new-way, and check all commands execute real code with the same expected_args => thus, test both click arg parser and oldish argparse simultaneously.
docs / examples / etc
... are updated to use gandlf subcommand ... instead of gandlf_subcommand ... entrypoint.
After you check these first sub-PRs, I would be able to update others if any common design flaws are found.
And one more sorry for such a big pile of code to be reviewed 😄
Would also be nice to have some documentation to add a new CLI module under the developer notes.
I vote to move to 0.1.0 after this PR gets merged rather than 0.0.20, since it is going to be a pretty big API change.
I vote to move to
0.1.0after this PR gets merged rather than0.0.20, since it is going to be a pretty big API change.
Both are ok for me; let's move to 0.1.0 then
@sarthakpati Fixed / commented all previously mentioned issues, the PR (and sub-PRs) are ready for the next review batch.
By the way: right now PR title check fails because the regexp ^[A-Z][a-zA-Z]*(?<!s)( [a-zA-Z]+)+[^.]$ prohibits the first word (Entrypoints here) to be ended with s. I feel being an Indiana Jones, catched by one of invisible and ancient traps. Is it ChatGPT so insidious, or is it an intended check?
@sarthakpati Fixed / commented all previously mentioned issues, the PR (and sub-PRs) are ready for the next review batch.
By the way: right now
PR titlecheck fails because the regexp^[A-Z][a-zA-Z]*(?<!s)( [a-zA-Z]+)+[^.]$prohibits the first word (Entrypointshere) to be ended withs. I feel being an Indiana Jones, catched by one of invisible and ancient traps. Is it ChatGPT so insidious, or is it an intended check?
🤣
I think that was unintended. Can you post an update to the regexp in a separate PR?
Dears, I viewed the changes in this PR and sub-prs @VukW mentioned, they look ok for me. I also agree with the version move into 0.1.0, quite a lot of changes made.
I have changed the base for this PR to a new branch where we shall keep changes related to the new API.
@VukW: can you fix the codacy issues before we merge this in?