unsupervised_analysis issues

improve data loading speed with Dask or NumPy

test it for e.g., pca.py **Dask**: Dask is a parallel computing library that integrates with pandas, NumPy, and scikit-learn. It can handle larger-than-memory datasets and can distribute the computation across...

sreichl

enhancement

address slow heatmaps

define too large: e.g., >10,000 samples/cells? ideas - for large data (define too large?) do not do heatmaps showing features and data, but instead determine distance matrices and show those...

sreichl

enhancement

diagnostic plot of all aggregated clustering results

1

- idea: a barplot ordered by number of clusters within each clustering - [ ] research alternatives that are common in the field

sreichl

enhancement

clustification: add visualization/diagnostics of performance/convergence over time

1

determine metrics at every iteration and plot at the end the time course. at least for the stopping criterion max. edge weight, but maybe also for f1 score and accuracy,....

sreichl

enhancement

new minor release when adapted to very large data

new mini release highlighting bug fixes and adaption to large (120k x 28k) & complex (342 groups of interest/labels) data - [ ] #36 - [ ] #37 - [...

sreichl

documentation

fix PCA pairplot by limiting legend exist only when less than X classes eg 10?

sreichl

bug

implement significance analysis for clustering

1

Significance analysis for clustering with single-cell RNA-sequencing data https://www.nature.com/articles/s41592-023-01933-9

sreichl

enhancement

reimplement internal cluster indices supporting different metrics and to improve performance

- Current implementation (clusterCrit) is fast on it's own but does not reuse distance matrices that could be determined only once. - Only euclidean metric is supported, extension to support...

sreichl

enhancement

add Variation of Information (VI) and Split/Join as external indices

- consider Variation of Information (VI) and Split/Join: https://stats.stackexchange.com/questions/24961/comparing-clusterings-rand-index-vs-variation-of-information

sreichl

enhancement

implement Manifold trustworthiness

https://scikit-learn.org/stable/modules/generated/sklearn.manifold.trustworthiness.html#sklearn.manifold.trustworthiness determine (if computational feasible) trustworthiness for every embedding and provide it in the results

sreichl

enhancement

unsupervised_analysis
unsupervised_analysis copied to clipboard

Metadata

improve data loading speed with Dask or NumPy

address slow heatmaps

diagnostic plot of all aggregated clustering results

clustification: add visualization/diagnostics of performance/convergence over time

new minor release when adapted to very large data

fix PCA pairplot by limiting legend exist only when less than X classes eg 10?

implement significance analysis for clustering

reimplement internal cluster indices supporting different metrics and to improve performance

add Variation of Information (VI) and Split/Join as external indices

implement Manifold trustworthiness

← Metadata

Owner

Metadata

unsupervised_analysis unsupervised_analysis copied to clipboard

Metadata

← Metadata

Owner

Metadata

unsupervised_analysis
unsupervised_analysis copied to clipboard