tedana
tedana copied to clipboard
Generate metrics based on decision tree
Closes #921.
Changes proposed in this pull request:
- This helps ensure that metrics requested in decision trees are reflected in the component table. This will allow users to develop decision trees that use metrics other than the ones currently hardcoded into the tedana workflow code, as long as the metrics are actually defined in the tedana metric collection function.
- Restructure the ComponentSelector class to take
component_table
,cross_component_metrics
,status_table
as parameters toselect
, rather than__init__
. This better fits the scikit-learn organization, in which__init__
takes hyperparameters andtransform
(which is equivalent toselect
) takes actual data.- Also add a trailing underscore to any attributes that are set in
select
.
- Also add a trailing underscore to any attributes that are set in
- Initialize the ComponentSelector in main tedana function, then grab the necessary metrics from the object. We also have to ensure that a few extra metrics are calculated, since they're used for the reports.
-
ComponentSelector
initialization is done before fMRI data are loaded to save time if the user-provided tree as an error
-
- When defined,
n_echoes
andn_vols
are stored inselector.cross_component_metrics
instead of as separate parameters -
ica_reclassify
now starts at the first node in the inputtedtree
that doesn't have anoutputs
key rather than the first node that isn't defined in an un-run tree. This potentially opens up additional functionality sinceica_reclassify
could build on multiple previous executions - Updates to docstrings and documentation to explain the added functionality.
I still need to update a bunch of tests and the reclassification workflow, since many things depend on the old class.
One thing I'm realizing here is that the current approach of passing the selector into decision functions and modifying it within them is unfortunately fragile. Given that the decision functions primarily modify the component table, I think they could be refactored to just accept the comptable and modify that.
@tsalo I'm realizing that to specify the external metrics in the decision tree json, like you requested in https://github.com/tsalo/tedana/pull/14 it will be much easier if we merge this first. I've identified a few of the breaking test issues and I think I can get this working, but it's going to intersect with a bunch of the recent updates to main. Do you want to align with main
or should I just do that and work from there?
One thing I'm realizing here is that the current approach of passing the selector into decision functions and modifying it within them is unfortunately fragile. Given that the decision functions primarily modify the component table, I think they could be refactored to just accept the comptable and modify that.
I think some of the selection_util
functions just interact with comptable, but the functions in selection_nodes.py
interact with multiple fields in selector.
Which aspect of fragility is bothering you? Having a better sense of that might help me figure out how to resolve that issue.
One other nomenclature thing. If this class is now for metrics and selection and is loaded much earlier in the workflow, is it still the Component_Selector
class with the object being called the selector
? My inclination is to leave that as is for now to minimize messing things up, but we might want to rename in a later PR depending how things are refactored.
I've identified a few of the breaking test issues and I think I can get this working, but it's going to intersect with a bunch of the recent updates to main. Do you want to align with main or should I just do that and work from there?
I think me updating from main is the easiest way forward, so I just did that.
Which aspect of fragility is bothering you? Having a better sense of that might help me figure out how to resolve that issue.
My concern is that the functions rely on specific attributes in the ComponentSelector. Some attributes were expected to be present after initialization, even though they depend on data that, in this PR, are provided in select.
One other nomenclature thing. If this class is now for metrics and selection and is loaded much earlier in the workflow, is it still the Component_Selector class with the object being called the selector? My inclination is to leave that as is for now to minimize messing things up, but we might want to rename in a later PR depending how things are refactored.
I like DecisionTree
, but agree that we can change it later.
I don't want to keep opening pull requests, but I'm working here https://github.com/handwerkerd/tedana/tree/gen-req-metrics-dh and I'll open a PR when I think this is reasonably stable.
I identified one issue that was specific to tests in that a bunch of tests need to use a component_table before it is run through the decision tree. This used to be output with initialization, but, the new version runs the selection process in the same function where the component table is added to the selector. I needed to add un-executed component table into the selector in test_selection_utils.py/sample_selector
to fix a bunch of testing failures.
The other issue is that, in the previous version n_echoes and n_vols were both keys in selector
and within selector.cross_component_metrics
with the later happening only when the selection process is run. I don't think this was necessary so now they're only in selector.cross_component_metrics
All tests are now passing on my branch.
test_selector_properties_smoke
had a similar issue to other ones in that it was supposed to run on the initial component table before the decision process.
There was also a bug in how the component_selector interacted with ica_reclassify
. In main
it identified the last node in the initial tree and then executed any added nodes. When you moved this from __init__
to select
you replaced this with the total number of nodes so it never ran the added nodes. My edit identifies the last node with an output
key (meaning it was executed) and then starts at the next node.
Now that tests are passing, I'm going to look through everything to make sure your changes make sense to me in general & I'll then open a PR.
Codecov Report
Attention: Patch coverage is 98.42932%
with 3 lines
in your changes are missing coverage. Please review.
Project coverage is 89.59%. Comparing base (
7c0b91f
) to head (72da98e
). Report is 1 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
tedana/workflows/tedana.py | 90.00% | 1 Missing and 1 partial :warning: |
tedana/selection/component_selector.py | 98.03% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #969 +/- ##
==========================================
+ Coverage 89.45% 89.59% +0.13%
==========================================
Files 26 26
Lines 3481 3478 -3
Branches 616 614 -2
==========================================
+ Hits 3114 3116 +2
Misses 213 213
+ Partials 154 149 -5
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@eurunuela Do you have time to review this in the near-ish future? Both Taylor & I worked on this, so it could use one more pair of eyes.
@eurunuela Do you have time to review this in the near-ish future? Both Taylor & I worked on this, so it could use one more pair of eyes.
Absolutely! I'll have a look at it next week.
@eurunuela Flagging again in case you have a chance to look this over. The dev call is next week and I'd like to know if there's anything in this PR that needs discussion there.
Yes, sorry. Will definitely review this week.
@tsalo The last round of edits were more me than you. You should probably do the last round of checks to make sure you're happy with this & then merge.
Can we merge this?
@tsalo You're effectively the second review since I made the last round of changes. If you're happy with this, then merge.