NiMARE
NiMARE copied to clipboard
Add coordinate-based coactivation-based parcellation class
Closes #260. Tagging @DiveicaV in case she wants to look at this.
We are using Chase et al. (2020) as the basis for our general approach- especially the metrics we're using for kernel and order selection.
EDIT: A recommendation from @SBEickhoff is to look at Liu et al. (2020) and Plachti et al. (2019) as well.
To do:
- [x] Support lists of values for
r
andn
parameters. These correspond to the "filter sizes" in Chase et al. (2020). - [ ] Determine clustering options
- [ ] Filter size selection step
- [ ] Metric: misclassified voxels
- [ ] Metric: variation of information
- [ ] Metric: silhouette value
- [ ] Metric: percentage of voxels not related to the dominant parent cluster
- [ ] Metric: change in inter- versus intra-cluster distance ratio
- [ ] Refactor to easily support ImageCBP and MAMP with limited code duplication
- [ ] Tests
- [ ] Documentation
Changes proposed in this pull request:
- Add
n
option toDataset.get_studies_by_coordinate()
. - Draft new
parcellate
module withCoordCBP
class.
@mriedel56 @62442katieb if possible, I'd love it if you could check out the new class (especially the _fit
method, which does the actual CBP) and give your thoughts. So far, I just have the most basic elements of the algorithm implemented, so I still need input on (1) the clustering algorithm options, (2) the metrics to use, and (3) the outputs to save.
Ultimately, I want this class to be fairly basic, meaning not including too many tunable parameters, with some documentation pointing toward cbptools
for users who require more control.
Additional questions:
- Should we run PCA before clustering? From the sklearn clustering user guide:
in very high-dimensional spaces, Euclidean distances tend to become inflated (this is an instance of the so-called “curse of dimensionality”). Running a dimensionality reduction algorithm such as Principal component analysis (PCA) prior to k-means clustering can alleviate this problem and speed up the computations.
- Do we want to leverage sample weights at all? E.g., by weighting by studies' sample sizes?
- How do we want to structure our outputs? The label maps can go in a standard MetaResult, but we have additional information, like filter selection ranges and metrics, that we probably want to output as well.
Codecov Report
Base: 88.55% // Head: 84.29% // Decreases project coverage by -4.26%
:warning:
Coverage data is based on head (
0c60dd5
) compared to base (e269941
). Patch coverage: 7.89% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## main #533 +/- ##
==========================================
- Coverage 88.55% 84.29% -4.27%
==========================================
Files 38 36 -2
Lines 4370 4069 -301
==========================================
- Hits 3870 3430 -440
- Misses 500 639 +139
Impacted Files | Coverage Δ | |
---|---|---|
nimare/parcellate.py | 0.00% <0.00%> (ø) |
|
nimare/dataset.py | 90.33% <100.00%> (+0.37%) |
:arrow_up: |
nimare/utils.py | ||
nimare/base.py | ||
nimare/__init__.py |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
https://github.com/neurosynth/neurosynth/blob/master/neurosynth/analysis/cluster.py
@62442katieb has some code from her naturalistic meta-analysis that may implement some of these metrics: https://github.com/62442katieb/meta-analytic-kmeans/blob/daf3904caad990aeadc89bc98769aaed32857e09/evaluating_clustering_solutions.ipynb