unsupervised_analysis
unsupervised_analysis copied to clipboard
A general purpose Snakemake workflow and MrBiomics module to perform unsupervised analyses (dimensionality reduction & cluster analysis) and visualizations of high-dimensional data.
- see other modules for inspiration (below are only examples) - **Implement labels** → removes long paths and makes reports much cleaner! ```python labels={ "data": "{gene}", "type": "genome track", "misc":...
- high dimensional UMAP/densMAP embedding (pro: non-linear; con: requires parameters) - PCA (pro: no parameters; con: linear) - Laplacian / spectral space (pro: more topologic; con: requires parameters and more...
- alternative/replacement to RFs - https://xgboost.readthedocs.io/en/stable/
look for clustering benchmark datasets (from various domains) to test the approach and put the result into the documentation) → Clustering benchmark papers
- use geom_point(pch='.') for 5x increase - https://ggirelli.info/blog/2021/08/17/speed-up-ggplot - https://stackoverflow.com/questions/10945707/speed-up-plot-function-for-large-dataset/33528065#33528065
both packages from MencheLab https://github.com/menchelab/CartoGRAPHs https://github.com/menchelab/VRNetzer -> now: https://github.com/menchelab/DataDiVR_WebApp - [ ] discuss w/ RB - [ ] discuss w/ Menchies e.g., Chris H.
Hi Stephan, another error: `logs/logs_slurm/plot_dimred_metadata_method=UMAP,n_components=2,parameters=euclidean_15_0.1,sample=subset_id.err` > rule plot_dimred_metadata: > input: path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/UMAP/UMAP_euclidean_15_0.1_2_data.csv, path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/UMAP/UMAP_euclidean_15_0.1_2_axes.csv, path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/subset_id/labels.csv, path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/metadata_features.csv, path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/metadata_clusterings.csv > output: path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/UMAP/plots/UMAP_euclidean_15_0.1_2_metadata.png > log: logs/rules/plot_metadata_subset_id_UMAP_euclidean_15_0.1_2.log > jobid: 0 > reason: Forced execution > wildcards:...
re-use the existing configuration of proportion empty "" or 0 (number probably better) and then instruct target rule accordingly.
Hi Stephan, here the bug: file: `logs/logs_slurm/plot_umap_diagnostics_method=densMAP,parameters=euclidean_15_0.1_2,sample=cancer__primary.err` error: > Traceback (most recent call last): > File "path/to/projects/project/modules/unsupervised_analysis/.snakemake/scripts/tmpgpjrocx8.plot_umap_diagnostics.py", line 41, in > umap.plot.diagnostic(umap_obj, diagnostic_type='neighborhood', nhood_size=min(umap_obj.n_neighbors, 15), ax=ax_diag[1,1]) > File "/path/to/snakemake_conda/4ba29b5deef3de008651353701702e01_/lib/python3.9/site-packages/umap/plot.py", line...