tedana icon indicating copy to clipboard operation
tedana copied to clipboard

[REF] Decision Tree Modularization

Open jbteves opened this issue 3 years ago • 7 comments

Closes #403, #808, #809 Supercedes #592   Changes proposed in this pull request: See #592 This replaces the inflexible decision tree in tedica.py with a modular structure that will allow for multiple default and user-defined decision trees along with a more interpretable and flexible system for tracking and understanding the results.   Noteworthy implemented features / changes decision tree modularization:

  • Main workflow in tedana.py is simplified and a separate workflow tedana_reclassify.py can be used to manually change component classifications. (Removed 1/5 of the lines of code in tedana.py that are just used to handle the manual classification condition)
  • Minimal and kundu decision trees are fully functional
  • Moved towards using the terminology of “Component Selection” rather than “Decision Tree” to refer to the code that’s part of the selection process. “Decision Tree” is still used to more specifically to refer to the steps to classify components.
  • ComponentSelector object created to include common elements from the selection process including the component_table and information about what happens along every step of the decision tree. Additional information that will now be tracked and stored includes:
    • component_table has columns for classification_tags instead of rationale so that each decision tree can include multiple descriptive tasks per component. Also an executed tree should only include accepted or rejected components. ignored components are now accepted with classification_tags like Low variance or Accept borderline
    • cross_component_metrics will save values calculated across components, such as the kappa & rho elbows
    • component_status_table contains how component classifications changed along every node of the decision tree
    • tree contains the executed tree along with everything calculated or changed during execution
    • used_metrics a list of all metrics used when running the tree. (Can potentially be used to calculate a subset of metrics given an inputted tree)
  • The new class is defined in ./selection/ComponentSelector.py, the functions that define each node of a decision tree are in ./section/selection_nodes.py and some key common functions used by selection_nodes are in ./selection/selection_utils.py
    • By convention, functions in selection_nodes.py that can change component classifications, begin with dec_ for decision and functions that calculate cross_component_metrics begin with calc_
    • A key function in selection_nodes.py is dec_left_op_right which can be used to change classifications based on the intersection of 1-3 boolean statements. This means most of the decision tree is modular functions that calculate cross_component_metrics and then tests of boolean conditional statements.
  • io.py is now used to output a registry (default is desc-tedana_registry.json) of all generated file names that can then be read in to load data rather than requiring follow-up programs, like RICA, to have multiple inputted file names.
  • New documentation building_decision_trees.rst gives a holistic view of everything in the component selection process
  • Some terminology changes, just as using component_table instead of comptable in code   Remaining work to do before merging
  • [ ] Improve documentation
    • [x] Link building_decision_trees.rst to the rest of the documentation
    • [ ] building_decision_trees.rst is both a guide for developers and an explanation of outputs that any user might want. Make sure the user-guide parts are more accessible within the existing documentation
    • [ ] rationale and descriptions of rationale are in several places in the documentation. Remove all and replace with an explanation of classification_tags
    • [ ] Make sure documentation and function docstrings render correctly. Proofread
  • [x] Triple check the kundu tree in ./resources/decusion_trees/kundu.json matches the decision tree code in Main
  • [ ] Testing coverage is already very good. While this is fresh in memory, try to improve testing coverage more.
  • [x] tedana reports
    • [x] Add visualization of kappa and rho elbows to tedana reports
    • [x] Add classification_tags to hover text
  • [x] Take a look at the rho elbow for the minimal tree to see if it’s too aggressive (easier to do after updates to reports)
  • [x] Change provisionalreject to unclassified in the kundu decision tree since those are accepted if not rejected by other criteria
  • [ ] Get the CLI working for tedana_reclassify
  • [x] @jbteves is working on mini-tools to directly compare the results in the current main with the results of this PR. This will include:
    • Script(s) and instructions to make it easier to run both main & this PR on the same data
    • Count and identify which components changed classifications between versions of the code and the % variance explained by those changes
    • ignored classification in main should be accepted classification in this PR with one of the following classification tags: “Low variance” “Accept borderline” “No provisional accept”

Remaining work where we can really use more help

  • [ ] Once the mini-tools are created, have multiple people run on multiple datasets to make sure it runs, gives plausible results, and users with various levels of expertise understand what’s happening.
    • If you want to help with this, leave a comment or contact @handwerkerd
    • Does Main match the output of the kundu decision tree?
    • Is the minimal decision tree reliably more conservative? The minimal tree should accept some component that were rejected by the kundu tree. It’s possible the kundu tree will accept a few low variance components that are rejected by the minimal tree.
  • [ ] After documentation is cleaned up, we will need both developers and non-developer users to read them for clarity.

There are several improvements that aren’t necessary before merging this PR which will be opened as stand alone issues. Links will be added to the issues as they are opened:

  • Update RICA to work with the output of this PR and tedana_reclassify.py
  • #888 For built-in tedana reports, better use classification tags to allow for dynamic coloring or selection of components This would be better done by someone who is a more experienced bokeh coder
  • #888 Write code to automatically create a flow chart or other summary/visualization for a decision tree. This can also be used to make a run-specific visualization that shows how components were reclassified at each node of the decision tree.
  • tedana_reclassify.py currently just works for manually changing accepted and rejected classifications. Either that function or another can be used to run a follow-up decision tree. Figure out if there are potential use-cases for this functionality and then update code.
  • Harmonize terminology better between selector and rest of code and remove places for typos. This could include   - using component_table rather than comptable everywhere   - Decide whether to standardize on d_table_score vs mean_metric_score   - Have the PCA component table better match the ICA component table (primarily changing from rationale to classification_tags terminology
    • Make all classification and classification_tag labels either fully case insensitive or changed to match the capitalization of where classification and tag labels are defined.
    • Create a field in the tree json for classification_tag_explanations In it would be a dictionary for each tag in the tree and an explanation for what that tag means. This would replace the hard-coded tedica-codes
  • tedana_reclassify.py better uses the new selector class structure and is cleaner than tedana.py Clean up tedana.py to better use this structure.
    • As part of this, references/citations are hard-coded into tedana.py These are also in the decision tree defining json files so it would be better to more modularly log references.
  • Add tree validation to dry run the tree and make sure used metrics exist before a tree is run

jbteves avatar Jul 15 '21 22:07 jbteves

@ME-ICA/tedana-devs Josh and I have been working on this and there's been lots of progress. I updated the initial comment to keep track of what's done and what still needs to get done. The big accomplishment is that the minimal decision tree is fully functional with all the structural and functionality changes that were recently discussed. Function docstrings are similarly updated (though I'm sure the won't all render prefectly) and I wrote document explaining the whole process at: https://github.com/jbteves/tedana/blob/JT_DTM/docs/building_decision_trees.rst

As you can see, there's still a good bit to do, including making the functions underlying the kundu decision tree functional again. This obviously isn't ready for a full review, but feedback is welcome.

handwerkerd avatar Nov 12 '21 20:11 handwerkerd

Here's the poll for scheduling a decision tree walk-through meeting sometime in the next two weeks: https://doodle.com/meeting/participate/id/9b6wn5Ld I've already limited to times I could be available. @tsalo @eurunuela @notZaki & @jbteves all expressed interest in attending.

handwerkerd avatar Mar 17 '22 15:03 handwerkerd

I was reading the docs again and I think it would be super helpful if every time we talk mention, e.g., necessary_metrics or functionname, we linked those keywords to where they are in the minimal decision tree. This way we direct users to an example, which will probably help them understand the docs better.

eurunuela avatar Apr 01 '22 10:04 eurunuela

@handwerkerd had to overwrite your changes for simplicity for mmix/black formatting, sorry.

jbteves avatar Apr 12 '22 19:04 jbteves

@eurunuela as an FYI I have created a new class, InputHarvester, that reads an OutputGenerator's "registry" of files from a previous run. You can use this to get all of the information you might want from a tedana run. See "tedana_reclassify.py" for an example of how this is done.

jbteves avatar Apr 13 '22 14:04 jbteves

More general FYI: this is a huge breaking change but basically the harder I tried to put --manacc into this framework, the worse it looked, so I gave up and created a new workflow, tedana_reclassify that has --manacc and --manrej options, it appears to work locally. I can't tell about CI because of the jinja issue.

jbteves avatar Apr 13 '22 14:04 jbteves

Codecov Report

Patch coverage: 96.18% and project coverage change: -4.34 :warning:

Comparison is base (fb6e255) 93.30% compared to head (8dcdafa) 88.97%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #756      +/-   ##
==========================================
- Coverage   93.30%   88.97%   -4.34%     
==========================================
  Files          28       27       -1     
  Lines        2346     3373    +1027     
  Branches        0      617     +617     
==========================================
+ Hits         2189     3001     +812     
- Misses        157      226      +69     
- Partials        0      146     +146     
Impacted Files Coverage Δ
tedana/decomposition/pca.py 76.61% <ø> (-12.91%) :arrow_down:
tedana/utils.py 94.59% <ø> (-2.71%) :arrow_down:
tedana/reporting/html_report.py 91.39% <60.00%> (-8.61%) :arrow_down:
tedana/reporting/static_figures.py 96.34% <66.66%> (-2.45%) :arrow_down:
tedana/docs.py 77.35% <77.35%> (ø)
tedana/reporting/dynamic_figures.py 96.05% <81.25%> (-3.95%) :arrow_down:
tedana/workflows/tedana.py 80.95% <85.71%> (-8.68%) :arrow_down:
tedana/io.py 87.37% <87.35%> (-6.64%) :arrow_down:
tedana/workflows/ica_reclassify.py 97.79% <97.79%> (ø)
tedana/selection/component_selector.py 99.01% <99.01%> (ø)
... and 7 more

... and 14 files with indirect coverage changes

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Aug 09 '22 14:08 codecov[bot]

I created a useful/ugly bash script to run the same dataset through Main and then this DTM branch with both the kundu and minimal tree & then compare using dtm_tools. Here's the script. It might require adjustments for other systems, but hopefully sharing this will save time for anyone who wants to run comparisons.

I think Kundu in this branch now matches main so I hope several of you can test this out on more data.

handwerkerd avatar Nov 29 '22 18:11 handwerkerd

For some reason ComponentSelector's docs do not render all of the method docstrings in our readthedocs build. I can't seem to figure out why, though it looks like the only templates are for functions. Does anybody know of a good primer on how to make one for a class and integrate it into our doc structure? Pinging @tsalo and @emdupre in particular.

jbteves avatar Nov 29 '22 20:11 jbteves

I see the documentation does not mention the reclassify workflow. The RICA paragraph will need to get updated based on what we write about it.

This comment is a follow-up of https://github.com/ME-ICA/tedana/pull/756/commits/a65a4cfe2c3fa60476b37471d20add68049f1561

eurunuela avatar Feb 16 '23 15:02 eurunuela

@ME-ICA/tedana-devs Today at 2:00PM EST (Your time zone), we are are planning to relase the last version of tedana before the major refactor (v0.0.13) and then merge this PR and release the more modularized version (v23.0.0).

Since this is way-too-many years in the making, we'll do this over zoom. Come join the "fun" at https://nih.zoomgov.com/j/1612837388?pwd%3DK1drdXVkK0xER1hEbkNzbUljQ0ZoUT09&sa=D&source=calendar&usd=2&usg=AOvVaw1cvaBfqiQE-iNPsBQwHmk5 Meeting ID: 161 283 7388 Passcode: 153769

handwerkerd avatar May 11 '23 13:05 handwerkerd