tedana
tedana copied to clipboard
[REF] Decision Tree Modularization
Closes #403, #808, #809 Supercedes #592 Changes proposed in this pull request: See #592 This replaces the inflexible decision tree in tedica.py with a modular structure that will allow for multiple default and user-defined decision trees along with a more interpretable and flexible system for tracking and understanding the results. Noteworthy implemented features / changes decision tree modularization:
- Main workflow in
tedana.py
is simplified and a separate workflowtedana_reclassify.py
can be used to manually change component classifications. (Removed 1/5 of the lines of code intedana.py
that are just used to handle the manual classification condition) - Minimal and kundu decision trees are fully functional
- Moved towards using the terminology of “Component Selection” rather than “Decision Tree” to refer to the code that’s part of the selection process. “Decision Tree” is still used to more specifically to refer to the steps to classify components.
-
ComponentSelector
object created to include common elements from the selection process including the component_table and information about what happens along every step of the decision tree. Additional information that will now be tracked and stored includes:-
component_table
has columns forclassification_tags
instead ofrationale
so that each decision tree can include multiple descriptive tasks per component. Also an executed tree should only includeaccepted
orrejected
components.ignored
components are nowaccepted
withclassification_tags
likeLow variance
orAccept borderline
-
cross_component_metrics
will save values calculated across components, such as the kappa & rho elbows -
component_status_table
contains how component classifications changed along every node of the decision tree -
tree
contains the executed tree along with everything calculated or changed during execution -
used_metrics
a list of all metrics used when running the tree. (Can potentially be used to calculate a subset of metrics given an inputted tree)
-
- The new class is defined in
./selection/ComponentSelector.py
, the functions that define each node of a decision tree are in./section/selection_nodes.py
and some key common functions used by selection_nodes are in./selection/selection_utils.py
- By convention, functions in selection_nodes.py that can change component classifications, begin with
dec_
for decision and functions that calculate cross_component_metrics begin withcalc_
- A key function in selection_nodes.py is
dec_left_op_right
which can be used to change classifications based on the intersection of 1-3 boolean statements. This means most of the decision tree is modular functions that calculate cross_component_metrics and then tests of boolean conditional statements.
- By convention, functions in selection_nodes.py that can change component classifications, begin with
-
io.py
is now used to output a registry (default isdesc-tedana_registry.json
) of all generated file names that can then be read in to load data rather than requiring follow-up programs, like RICA, to have multiple inputted file names. - New documentation
building_decision_trees.rst
gives a holistic view of everything in the component selection process - Some terminology changes, just as using
component_table
instead ofcomptable
in code Remaining work to do before merging - [ ] Improve documentation
- [x] Link
building_decision_trees.rst
to the rest of the documentation - [ ]
building_decision_trees.rst
is both a guide for developers and an explanation of outputs that any user might want. Make sure the user-guide parts are more accessible within the existing documentation - [ ]
rationale
and descriptions of rationale are in several places in the documentation. Remove all and replace with an explanation ofclassification_tags
- [ ] Make sure documentation and function docstrings render correctly. Proofread
- [x] Link
- [x] Triple check the kundu tree in
./resources/decusion_trees/kundu.json
matches the decision tree code in Main - [ ] Testing coverage is already very good. While this is fresh in memory, try to improve testing coverage more.
- [x] tedana reports
- [x] Add visualization of kappa and rho elbows to tedana reports
- [x] Add
classification_tags
to hover text
- [x] Take a look at the rho elbow for the minimal tree to see if it’s too aggressive (easier to do after updates to reports)
- [x] Change
provisionalreject
tounclassified
in the kundu decision tree since those are accepted if not rejected by other criteria - [ ] Get the CLI working for tedana_reclassify
- [x] @jbteves is working on mini-tools to directly compare the results in the current main with the results of this PR. This will include:
- Script(s) and instructions to make it easier to run both main & this PR on the same data
- Count and identify which components changed classifications between versions of the code and the % variance explained by those changes
-
ignored
classification in main should beaccepted
classification in this PR with one of the following classification tags: “Low variance” “Accept borderline” “No provisional accept”
Remaining work where we can really use more help
- [ ] Once the mini-tools are created, have multiple people run on multiple datasets to make sure it runs, gives plausible results, and users with various levels of expertise understand what’s happening.
- If you want to help with this, leave a comment or contact @handwerkerd
- Does Main match the output of the kundu decision tree?
- Is the minimal decision tree reliably more conservative? The minimal tree should accept some component that were rejected by the kundu tree. It’s possible the kundu tree will accept a few low variance components that are rejected by the minimal tree.
- [ ] After documentation is cleaned up, we will need both developers and non-developer users to read them for clarity.
There are several improvements that aren’t necessary before merging this PR which will be opened as stand alone issues. Links will be added to the issues as they are opened:
- Update RICA to work with the output of this PR and
tedana_reclassify.py
- #888 For built-in tedana reports, better use classification tags to allow for dynamic coloring or selection of components This would be better done by someone who is a more experienced bokeh coder
- #888 Write code to automatically create a flow chart or other summary/visualization for a decision tree. This can also be used to make a run-specific visualization that shows how components were reclassified at each node of the decision tree.
-
tedana_reclassify.py
currently just works for manually changing accepted and rejected classifications. Either that function or another can be used to run a follow-up decision tree. Figure out if there are potential use-cases for this functionality and then update code. - Harmonize terminology better between selector and rest of code and remove places for typos. This could include
- using
component_table
rather thancomptable
everywhere - Decide whether to standardize ond_table_score
vsmean_metric_score
- Have the PCA component table better match the ICA component table (primarily changing fromrationale
toclassification_tags
terminology- Make all classification and classification_tag labels either fully case insensitive or changed to match the capitalization of where classification and tag labels are defined.
- Create a field in the tree json for
classification_tag_explanations
In it would be a dictionary for each tag in the tree and an explanation for what that tag means. This would replace the hard-coded tedica-codes
-
tedana_reclassify.py
better uses the new selector class structure and is cleaner thantedana.py
Clean uptedana.py
to better use this structure.- As part of this, references/citations are hard-coded into
tedana.py
These are also in the decision tree defining json files so it would be better to more modularly log references.
- As part of this, references/citations are hard-coded into
- Add tree validation to dry run the tree and make sure used metrics exist before a tree is run
@ME-ICA/tedana-devs Josh and I have been working on this and there's been lots of progress. I updated the initial comment to keep track of what's done and what still needs to get done. The big accomplishment is that the minimal decision tree is fully functional with all the structural and functionality changes that were recently discussed. Function docstrings are similarly updated (though I'm sure the won't all render prefectly) and I wrote document explaining the whole process at: https://github.com/jbteves/tedana/blob/JT_DTM/docs/building_decision_trees.rst
As you can see, there's still a good bit to do, including making the functions underlying the kundu decision tree functional again. This obviously isn't ready for a full review, but feedback is welcome.
Here's the poll for scheduling a decision tree walk-through meeting sometime in the next two weeks: https://doodle.com/meeting/participate/id/9b6wn5Ld I've already limited to times I could be available. @tsalo @eurunuela @notZaki & @jbteves all expressed interest in attending.
I was reading the docs again and I think it would be super helpful if every time we talk mention, e.g., necessary_metrics
or functionname
, we linked those keywords to where they are in the minimal decision tree. This way we direct users to an example, which will probably help them understand the docs better.
@handwerkerd had to overwrite your changes for simplicity for mmix/black formatting, sorry.
@eurunuela as an FYI I have created a new class, InputHarvester, that reads an OutputGenerator's "registry" of files from a previous run. You can use this to get all of the information you might want from a tedana run. See "tedana_reclassify.py" for an example of how this is done.
More general FYI: this is a huge breaking change but basically the harder I tried to put --manacc
into this framework, the worse it looked, so I gave up and created a new workflow, tedana_reclassify
that has --manacc
and --manrej
options, it appears to work locally. I can't tell about CI because of the jinja issue.
Codecov Report
Patch coverage: 96.18
% and project coverage change: -4.34
:warning:
Comparison is base (
fb6e255
) 93.30% compared to head (8dcdafa
) 88.97%.
Additional details and impacted files
@@ Coverage Diff @@
## main #756 +/- ##
==========================================
- Coverage 93.30% 88.97% -4.34%
==========================================
Files 28 27 -1
Lines 2346 3373 +1027
Branches 0 617 +617
==========================================
+ Hits 2189 3001 +812
- Misses 157 226 +69
- Partials 0 146 +146
Impacted Files | Coverage Δ | |
---|---|---|
tedana/decomposition/pca.py | 76.61% <ø> (-12.91%) |
:arrow_down: |
tedana/utils.py | 94.59% <ø> (-2.71%) |
:arrow_down: |
tedana/reporting/html_report.py | 91.39% <60.00%> (-8.61%) |
:arrow_down: |
tedana/reporting/static_figures.py | 96.34% <66.66%> (-2.45%) |
:arrow_down: |
tedana/docs.py | 77.35% <77.35%> (ø) |
|
tedana/reporting/dynamic_figures.py | 96.05% <81.25%> (-3.95%) |
:arrow_down: |
tedana/workflows/tedana.py | 80.95% <85.71%> (-8.68%) |
:arrow_down: |
tedana/io.py | 87.37% <87.35%> (-6.64%) |
:arrow_down: |
tedana/workflows/ica_reclassify.py | 97.79% <97.79%> (ø) |
|
tedana/selection/component_selector.py | 99.01% <99.01%> (ø) |
|
... and 7 more |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
I created a useful/ugly bash script to run the same dataset through Main and then this DTM branch with both the kundu and minimal tree & then compare using dtm_tools. Here's the script. It might require adjustments for other systems, but hopefully sharing this will save time for anyone who wants to run comparisons.
I think Kundu in this branch now matches main so I hope several of you can test this out on more data.
For some reason ComponentSelector's docs do not render all of the method docstrings in our readthedocs build. I can't seem to figure out why, though it looks like the only templates are for functions. Does anybody know of a good primer on how to make one for a class and integrate it into our doc structure? Pinging @tsalo and @emdupre in particular.
I see the documentation does not mention the reclassify workflow. The RICA paragraph will need to get updated based on what we write about it.
This comment is a follow-up of https://github.com/ME-ICA/tedana/pull/756/commits/a65a4cfe2c3fa60476b37471d20add68049f1561
@ME-ICA/tedana-devs Today at 2:00PM EST (Your time zone), we are are planning to relase the last version of tedana before the major refactor (v0.0.13) and then merge this PR and release the more modularized version (v23.0.0).
Since this is way-too-many years in the making, we'll do this over zoom. Come join the "fun" at https://nih.zoomgov.com/j/1612837388?pwd%3DK1drdXVkK0xER1hEbkNzbUljQ0ZoUT09&sa=D&source=calendar&usd=2&usg=AOvVaw1cvaBfqiQE-iNPsBQwHmk5 Meeting ID: 161 283 7388 Passcode: 153769