unsupervised_analysis issues

enhance Snakemake report using labels

- see other modules for inspiration (below are only examples) - **Implement labels** → removes long paths and makes reports much cleaner! ```python labels={ "data": "{gene}", "type": "genome track", "misc":...

sreichl

enhancement

consider PCA parameter of % of variance to keep to speed up for large data

sreichl

enhancement

clustification: train/predict on approximated manifold instead of original space

- high dimensional UMAP/densMAP embedding (pro: non-linear; con: requires parameters) - PCA (pro: no parameters; con: linear) - Laplacian / spectral space (pro: more topologic; con: requires parameters and more...

sreichl

enhancement

clustification: implement XGBoost

- alternative/replacement to RFs - https://xgboost.readthedocs.io/en/stable/

sreichl

enhancement

clustification: Benchmark clf-based clustering approach

5

look for clustering benchmark datasets (from various domains) to test the approach and put the result into the documentation) → Clustering benchmark papers

sreichl

enhancement

improve scatter plot speed

1

- use geom_point(pch='.') for 5x increase - https://ggirelli.info/blog/2021/08/17/speed-up-ggplot - https://stackoverflow.com/questions/10945707/speed-up-plot-function-for-large-dataset/33528065#33528065

sreichl

enhancement

consider adding CartoGRAPHs for visualization and DataDiVR export for VR based analysis

both packages from MencheLab https://github.com/menchelab/CartoGRAPHs https://github.com/menchelab/VRNetzer -> now: https://github.com/menchelab/DataDiVR_WebApp - [ ] discuss w/ RB - [ ] discuss w/ Menchies e.g., Chris H.

sreichl

enhancement

Error plot_dimred_metadata

1

Hi Stephan, another error: `logs/logs_slurm/plot_dimred_metadata_method=UMAP,n_components=2,parameters=euclidean_15_0.1,sample=subset_id.err` > rule plot_dimred_metadata: > input: path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/UMAP/UMAP_euclidean_15_0.1_2_data.csv, path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/UMAP/UMAP_euclidean_15_0.1_2_axes.csv, path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/subset_id/labels.csv, path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/metadata_features.csv, path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/metadata_clusterings.csv > output: path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/UMAP/plots/UMAP_euclidean_15_0.1_2_metadata.png > log: logs/rules/plot_metadata_subset_id_UMAP_euclidean_15_0.1_2.log > jobid: 0 > reason: Forced execution > wildcards:...

bednarsky

bug

make cluster validation using indices optional

re-use the existing configuration of proportion empty "" or 0 (number probably better) and then instruct target rule accordingly.

sreichl

enhancement

UMAP diagnostics bug - numba intersect1d

3

Hi Stephan, here the bug: file: `logs/logs_slurm/plot_umap_diagnostics_method=densMAP,parameters=euclidean_15_0.1_2,sample=cancer__primary.err` error: > Traceback (most recent call last): > File "path/to/projects/project/modules/unsupervised_analysis/.snakemake/scripts/tmpgpjrocx8.plot_umap_diagnostics.py", line 41, in > umap.plot.diagnostic(umap_obj, diagnostic_type='neighborhood', nhood_size=min(umap_obj.n_neighbors, 15), ax=ax_diag[1,1]) > File "/path/to/snakemake_conda/4ba29b5deef3de008651353701702e01_/lib/python3.9/site-packages/umap/plot.py", line...

bednarsky

bug

unsupervised_analysis
unsupervised_analysis copied to clipboard

Metadata

enhance Snakemake report using labels

consider PCA parameter of % of variance to keep to speed up for large data

clustification: train/predict on approximated manifold instead of original space

clustification: implement XGBoost

clustification: Benchmark clf-based clustering approach

improve scatter plot speed

consider adding CartoGRAPHs for visualization and DataDiVR export for VR based analysis

Error plot_dimred_metadata

make cluster validation using indices optional

UMAP diagnostics bug - numba intersect1d

← Metadata

Owner

Metadata

unsupervised_analysis unsupervised_analysis copied to clipboard

Metadata

← Metadata

Owner

Metadata

unsupervised_analysis
unsupervised_analysis copied to clipboard