tedana icon indicating copy to clipboard operation
tedana copied to clipboard

Generate metrics based on decision tree

Open tsalo opened this issue 1 year ago • 11 comments

Closes #921.

Changes proposed in this pull request:

  • This helps ensure that metrics requested in decision trees are reflected in the component table. This will allow users to develop decision trees that use metrics other than the ones currently hardcoded into the tedana workflow code, as long as the metrics are actually defined in the tedana metric collection function.
  • Restructure the ComponentSelector class to take component_table, cross_component_metrics, status_table as parameters to select, rather than __init__. This better fits the scikit-learn organization, in which __init__ takes hyperparameters and transform (which is equivalent to select) takes actual data.
    • Also add a trailing underscore to any attributes that are set in select.
  • Initialize the ComponentSelector in main tedana function, then grab the necessary metrics from the object. We also have to ensure that a few extra metrics are calculated, since they're used for the reports.
    • ComponentSelector initialization is done before fMRI data are loaded to save time if the user-provided tree as an error
  • When defined, n_echoes and n_vols are stored in selector.cross_component_metrics instead of as separate parameters
  • ica_reclassify now starts at the first node in the inputted tree that doesn't have an outputs key rather than the first node that isn't defined in an un-run tree. This potentially opens up additional functionality since ica_reclassify could build on multiple previous executions
  • Updates to docstrings and documentation to explain the added functionality.

tsalo avatar Aug 11 '23 18:08 tsalo

I still need to update a bunch of tests and the reclassification workflow, since many things depend on the old class.

tsalo avatar Aug 23 '23 19:08 tsalo

One thing I'm realizing here is that the current approach of passing the selector into decision functions and modifying it within them is unfortunately fragile. Given that the decision functions primarily modify the component table, I think they could be refactored to just accept the comptable and modify that.

tsalo avatar Feb 16 '24 21:02 tsalo

@tsalo I'm realizing that to specify the external metrics in the decision tree json, like you requested in https://github.com/tsalo/tedana/pull/14 it will be much easier if we merge this first. I've identified a few of the breaking test issues and I think I can get this working, but it's going to intersect with a bunch of the recent updates to main. Do you want to align with main or should I just do that and work from there?

handwerkerd avatar Feb 26 '24 17:02 handwerkerd

One thing I'm realizing here is that the current approach of passing the selector into decision functions and modifying it within them is unfortunately fragile. Given that the decision functions primarily modify the component table, I think they could be refactored to just accept the comptable and modify that.

I think some of the selection_util functions just interact with comptable, but the functions in selection_nodes.py interact with multiple fields in selector. Which aspect of fragility is bothering you? Having a better sense of that might help me figure out how to resolve that issue.

handwerkerd avatar Feb 26 '24 17:02 handwerkerd

One other nomenclature thing. If this class is now for metrics and selection and is loaded much earlier in the workflow, is it still the Component_Selector class with the object being called the selector? My inclination is to leave that as is for now to minimize messing things up, but we might want to rename in a later PR depending how things are refactored.

handwerkerd avatar Feb 26 '24 17:02 handwerkerd

I've identified a few of the breaking test issues and I think I can get this working, but it's going to intersect with a bunch of the recent updates to main. Do you want to align with main or should I just do that and work from there?

I think me updating from main is the easiest way forward, so I just did that.

Which aspect of fragility is bothering you? Having a better sense of that might help me figure out how to resolve that issue.

My concern is that the functions rely on specific attributes in the ComponentSelector. Some attributes were expected to be present after initialization, even though they depend on data that, in this PR, are provided in select.

One other nomenclature thing. If this class is now for metrics and selection and is loaded much earlier in the workflow, is it still the Component_Selector class with the object being called the selector? My inclination is to leave that as is for now to minimize messing things up, but we might want to rename in a later PR depending how things are refactored.

I like DecisionTree, but agree that we can change it later.

tsalo avatar Feb 26 '24 21:02 tsalo

I don't want to keep opening pull requests, but I'm working here https://github.com/handwerkerd/tedana/tree/gen-req-metrics-dh and I'll open a PR when I think this is reasonably stable.

I identified one issue that was specific to tests in that a bunch of tests need to use a component_table before it is run through the decision tree. This used to be output with initialization, but, the new version runs the selection process in the same function where the component table is added to the selector. I needed to add un-executed component table into the selector in test_selection_utils.py/sample_selector to fix a bunch of testing failures.

The other issue is that, in the previous version n_echoes and n_vols were both keys in selector and within selector.cross_component_metrics with the later happening only when the selection process is run. I don't think this was necessary so now they're only in selector.cross_component_metrics

handwerkerd avatar Feb 26 '24 22:02 handwerkerd

All tests are now passing on my branch.

test_selector_properties_smoke had a similar issue to other ones in that it was supposed to run on the initial component table before the decision process.

There was also a bug in how the component_selector interacted with ica_reclassify. In main it identified the last node in the initial tree and then executed any added nodes. When you moved this from __init__ to select you replaced this with the total number of nodes so it never ran the added nodes. My edit identifies the last node with an output key (meaning it was executed) and then starts at the next node.

Now that tests are passing, I'm going to look through everything to make sure your changes make sense to me in general & I'll then open a PR.

handwerkerd avatar Feb 27 '24 02:02 handwerkerd

Codecov Report

Attention: Patch coverage is 98.42932% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 89.59%. Comparing base (7c0b91f) to head (72da98e). Report is 1 commits behind head on main.

Files Patch % Lines
tedana/workflows/tedana.py 90.00% 1 Missing and 1 partial :warning:
tedana/selection/component_selector.py 98.03% 1 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #969      +/-   ##
==========================================
+ Coverage   89.45%   89.59%   +0.13%     
==========================================
  Files          26       26              
  Lines        3481     3478       -3     
  Branches      616      614       -2     
==========================================
+ Hits         3114     3116       +2     
  Misses        213      213              
+ Partials      154      149       -5     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Feb 28 '24 16:02 codecov[bot]

@eurunuela Do you have time to review this in the near-ish future? Both Taylor & I worked on this, so it could use one more pair of eyes.

handwerkerd avatar Feb 29 '24 15:02 handwerkerd

@eurunuela Do you have time to review this in the near-ish future? Both Taylor & I worked on this, so it could use one more pair of eyes.

Absolutely! I'll have a look at it next week.

eurunuela avatar Mar 01 '24 09:03 eurunuela

@eurunuela Flagging again in case you have a chance to look this over. The dev call is next week and I'd like to know if there's anything in this PR that needs discussion there.

handwerkerd avatar Mar 12 '24 18:03 handwerkerd

Yes, sorry. Will definitely review this week.

eurunuela avatar Mar 12 '24 18:03 eurunuela

@tsalo The last round of edits were more me than you. You should probably do the last round of checks to make sure you're happy with this & then merge.

handwerkerd avatar Mar 13 '24 14:03 handwerkerd

Can we merge this?

tsalo avatar Mar 20 '24 21:03 tsalo

@tsalo You're effectively the second review since I made the last round of changes. If you're happy with this, then merge.

handwerkerd avatar Mar 20 '24 21:03 handwerkerd