GaNDLF [DRAFT] Creating a separate branch for changes related to new APIs

Fixes #ISSUE_NUMBER

Proposed Changes

DO NOT MERGE THIS IN!!!

Checklist

[ ] CONTRIBUTING guide has been followed.
[ ] PR is based on the current GaNDLF master .
[ ] Non-breaking change (does not break existing functionality): provide as many details as possible for any breaking change.
[ ] Function/class source code documentation added/updated (ensure typing is used to provide type hints, including and not limited to using Optional if a variable has a pre-defined value).
[ ] Code has been blacked for style consistency and linting.
[ ] If applicable, version information has been updated in GANDLF/version.py.
[ ] If adding a git submodule, add to list of exceptions for black styling in pyproject.toml file.
[ ] Usage documentation has been updated, if appropriate.
[ ] Tests added or modified to cover the changes; if coverage is reduced, please give explanation.
[ ] If customized dependency installation is required (i.e., a separate pip install step is needed for PR to be functional), please ensure it is reflected in all the files that control the CI, namely: python-test.yml, and all docker files [1,2,3].

Apr 09 '24 17:04 sarthakpati

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Apr 09 '24 17:04 github-actions[bot]

Codecov Report

Attention: Patch coverage is 94.25386% with 67 lines in your changes missing coverage. Please review.

Project coverage is 94.61%. Comparing base (5e43783) to head (5652542).

Files	Patch %	Lines
testing/entrypoints/__init__.py	84.72%	22 Missing :warning:
GANDLF/entrypoints/collect_stats.py	77.50%	9 Missing :warning:
update_version.py	57.14%	9 Missing :warning:
GANDLF/config_manager.py	64.70%	6 Missing :warning:
GANDLF/data/augmentation/__init__.py	73.33%	4 Missing :warning:
GANDLF/entrypoints/verify_install.py	86.95%	3 Missing :warning:
GANDLF/entrypoints/construct_csv.py	96.49%	2 Missing :warning:
GANDLF/entrypoints/optimize_model.py	93.10%	2 Missing :warning:
GANDLF/cli/deploy.py	0.00%	1 Missing :warning:
GANDLF/entrypoints/anonymizer.py	97.95%	1 Missing :warning:
... and 8 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #845      +/-   ##
==========================================
- Coverage   95.04%   94.61%   -0.44%     
==========================================
  Files         122      160      +38     
  Lines        8297     9450    +1153     
==========================================
+ Hits         7886     8941    +1055     
- Misses        411      509      +98

Flag	Coverage Δ
unittests	`94.61% <94.25%> (-0.44%)`	:arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Apr 09 '24 18:04 codecov[bot]

Hi folks! We have a couple steps before PR can be merged:

fix & merge https://github.com/mlcommons/GaNDLF/pull/853 where some output param names are changed. @scap3yvt
Despite all the new code is covered by tests, the overall coverage was dropped as we included entrypoint scripts to be calculated for coverage. We can either add some new tests for that or decrease coverage threshold. @sarthakpati what do you think?

Apr 22 '24 17:04 VukW

Apologies, somehow I missed this comment, @VukW!

fix & merge Updated CLI options for new API #853 where some output param names are changed. @scap3yvt

This should be done.

Despite all the new code is covered by tests, the overall coverage was dropped as we included entrypoint scripts to be calculated for coverage. We can either add some new tests for that or decrease coverage threshold. @sarthakpati what do you think?

How difficult would it be to add new tests? To note that this branch is supposed to stay as a separate branch (and not get merged into master) for at least a few months.

Apr 29 '24 13:04 sarthakpati

To note that this branch is supposed to stay as a separate branch (and not get merged into master) for at least a few months.

@sarthakpati 😱Why so? It's not a big problem to add a new test, but it may be disturbing to support two branches (this and main) simultaneously - especially if we'd need to fix / improve anything in entrypoints scripts (that's highly probable on the range of a few months)

Apr 29 '24 13:04 VukW

I agree (and the idea to delay the merge is more of a strategic endeavor than anything else). Do you think it makes sense to move the master the newer API and keep a "legacy" branch? I am honestly agnostic to either, just that we need to delay the actual release of the new API branch for a few months.

Apr 29 '24 14:04 sarthakpati

Can you plz describe the strategic reason a bit wider as I don't get it quite well? I mean, depending on why exactly we want to wait without merging, I am imaginating different possible solutions. But as this PR is not breaking old behavior, in most of cases merging it looks better for me rather then supporting two branches. Different options I do see:

a. Merge as-is. We do not break old-way behavior however we distantiate from supporting it, noting that in future all new improvements and features would be implemented in new-way CLI. b. Mark new-way CLI as experimental and old-day scripts as stable and merge. Thus, users would still use old-versioned entrypoint scripts, but it would be much easier for us to implement new features both in old-way and new-way commands. c. Postpone merging for a couple of weeks..one month if there are any specific issues / notes that we want to introduce together with a newer CLI. d. Decline the whole PR / new CLI API

Say, we have the following confusions:

"We are introducing breaking changes to API? Wow, great! Then lets also fix X and Y and Z and maybe smth else right now, we don't want our users to change their scripts one more time in the future!". Say, standartizing input/output param names. IMO best solutions are (c) (postpone for a short time if we already know what should be fixed) or (b) (add new API as experimental and play with it to understand what can be improved)
"The whole PR looks like too complex... What if we break something? What if we break user behavior?" As PR is designed as non-breaking, we ensure users can still run old scripts as previously (in particular, unit tests assure it), so we may do (a). However if we still unconfident, we can do (b), marking new-way as experimental.
"New CLI code looks more complex for developers, it would be harder to understand it for new developers" Not arguing with subjectivity / objectivity of the reason (anyway, it's just an example), there might be cases when we are confused with the basic foundation of this PR. In such a case, (d) - declining everything - looks like a reasonable solution for stopping over-engineering.
"If merging new CLI, then we need to bump version to 1.0, but this requires to refactor all other code". In this case it seems to me more feasible to eat it piece by piece - say, merge new CLI (a) as 0.1.0, but not bumping major version (1.0.0) until everything is fixed

So, for me it seems like merging the new CLI or declining it totally is always better (from the terms of supportability) then keeping two branches. And if keep two branches - it might be possible, but for some finite period of time until we decide what to do further. That's why I'm asking to understand better a specific reason why / what do we want to do with this in future:)

Apr 29 '24 15:04 VukW

@VukW: Wow, that was elaborate 😄

Anyway, the reason behind having a separate branch was to delay the actual "release" (i.e., into a tag) of the new API for a few months. But your assessment is completely spot-on. Considering we are a tight knit group of contributors, I vote for option (a), but after #842 has been merged in. The reason behind that is that it would be great to have a holistic logging functionality available along with the new APIs. Thoughts, @sylwiamm?

Apr 29 '24 17:04 sarthakpati

@sarthakpati 👍 So, we are not pushing new release / new tag / new wheel package for now, but we kindly ask for all new PRs to be based on this branch instead of master, right? In this way it's easier for us to merge any new features / fixes rather then merge them to master & solve conflicts after

Apr 29 '24 18:04 VukW

So, we are not pushing new release / new tag / new wheel package for now, but we kindly ask for all new PRs to be based on this branch instead of master, right?

Yes, precisely! But I would suggest let's wait for @sylwiamm to wrap up the logging work and then we proceed with this. Thoughts?

Apr 29 '24 19:04 sarthakpati

Hey @VukW, now that #893 has been merged, shall we merge this in and then point users who need the older API to https://github.com/mlcommons/GaNDLF/tree/older_apis?

Jul 24 '24 15:07 sarthakpati

Hmmm So, I do see the following tags right now:

0.0.19 tag, made in Feb 2024
current master, version 0.0.20-dev with a bunch of fixes over 0.0.19
old_apis == master + a deprecation notice in Readme
this branch, aimed to be merged to master and make a 0.1.0 version

Current master is in stable and working state, right? What do you think if we make two separate releases sequentially: first, release current master under 0.0.20 version, and then merge and release this new-api branch under 0.1.0 version? In this case people who depend on v0.0.19 would be able to update up to 0.0.20 automatically (without any backward incompatibilities) and to use some added features. Thus, we won't need to support a separate branch (that doesn't really match gandlf's default git flow) - instead we just mark current master with a 0.0.20 tag and that's aligned with all the previous and future releases

*If you are not sure master is stable enough right now, we can use your proposed solution - just keep this old_apis branch, without actual releasing it

Jul 24 '24 15:07 VukW

That's a great idea, thank you! I am going to get started on this.

Jul 24 '24 19:07 sarthakpati

That's a great idea, thank you! I am going to get started on this.

Update: #904 is ready for us to merge. This will allow us to tag the current master, following which we will merge this branch and move on.

Jul 24 '24 20:07 sarthakpati

@mlcommons/gandlf-write - there are multiple folks who have contributed to this PR (including all maintainers). Any proposals on what protocol to follow WRT reviews before merging?

Jul 25 '24 13:07 sarthakpati

@szmazurek I should have addressed all comments...

Jul 29 '24 16:07 sarthakpati