unsupervised_analysis
unsupervised_analysis copied to clipboard
Error plot_dimred_metadata
Hi Stephan,
another error:
logs/logs_slurm/plot_dimred_metadata_method=UMAP,n_components=2,parameters=euclidean_15_0.1,sample=subset_id.err
rule plot_dimred_metadata: input: path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/UMAP/UMAP_euclidean_15_0.1_2_data.csv, path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/UMAP/UMAP_euclidean_15_0.1_2_axes.csv, path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/subset_id/labels.csv, path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/metadata_features.csv, path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/metadata_clusterings.csv output: path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/UMAP/plots/UMAP_euclidean_15_0.1_2_metadata.png log: logs/rules/plot_metadata_subset_id_UMAP_euclidean_15_0.1_2.log jobid: 0 reason: Forced execution wildcards: sample=subset_id, method=UMAP, parameters=euclidean_15_0.1, n_components=2 threads: 2 resources: mem_mb=128000, disk_mb=1000, tmpdir=/tmp
Activating conda environment: ../../../../../../path/to/snakemake_conda/7e3a48a04ecb72cc15f09fd456de7cf6_ Error in if (all(metadata[[col]] == round(metadata[[col]]))) { : missing value where TRUE/FALSE needed Execution halted Not cleaning up path/to/projects/project/modules/unsupervised_analysis/.snakemake/scripts/tmpa5ioybfx.plot_2d.R [Thu Feb 29 10:43:21 2024] Error in rule plot_dimred_metadata: jobid: 0 output: path/to/projects/project/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/unsupervised_analysis/subset_id/UMAP/plots/UMAP_euclidean_15_0.1_2_metadata.png log: logs/rules/plot_metadata_subset_id_UMAP_euclidean_15_0.1_2.log (check log file(s) for error message) conda-env: /path/to/snakemake_conda/7e3a48a04ecb72cc15f09fd456de7cf6_
RuleException: CalledProcessErrorin line 69 of path/to/projects/project/modules/unsupervised_analysis/workflow/rules/visualization.smk: Command 'source /path/to/miniconda3/bin/activate '/path/to/snakemake_conda/7e3a48a04ecb72cc15f09fd456de7cf6_'; set -eo pipefail; Rscript --vanilla path/to/projects/project/modules/unsupervised_analysis/.snakemake/scripts/tmpa5ioybfx.plot_2d.R' returned non-zero exit status 1. File "path/to/projects/project/modules/unsupervised_analysis/workflow/rules/visualization.smk", line 69, in __rule_plot_dimred_metadata File "/path/to/miniconda3/envs/snakemake7_15_2/lib/python3.10/concurrent/futures/thread.py", line 58, in run Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message
My sample sheet looks like this
name,data,metadata,samples_by_features
subset_name,/path/to/results/demultiplexing/first_batch_of_samples/scvi/X_scVI__subset_name.csv,/path/to/results/demultiplexing/first_batch_of_samples/unsupervised_analysis/subset_name/labels.csv,1
My data file are the scVI coordinates
,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29
sID367_AAACAGCCAAGTTATC-1,0.8766483,0.011868089,-0.14010176,0.0021616258,-1.7530527,0.8266239,0.05426112,-0.29800522,-0.5499921,0.40404356,0.55395395,-0.011951547,0.070310116,0.1317476,0.09836078,-1.216123,-0.9669612,0.44787252,1.2134984,-1.6698662,-0.88864696,-0.31392187,-0.18586576,-1.0976224,-0.89937776,0.7491747,-0.39786023,-0.3978194,0.009200797,-0.009404337
sID367_AAACATGCAACTAGCC-1,1.2735096,-1.2301883,0.4899947,0.004316354,-0.4477949,-1.0801263,-0.41472286,-0.10565293,-1.0443822,0.1124156,0.71335185,0.01858944,0.069815695,1.8595614,0.9100859,-0.7941134,0.13103442,-0.38214165,0.01599136,0.719963,-1.0942267,0.033875763,0.5481672,-0.029896438,-1.0036578,-0.7464532,-0.04965532,0.33992022,0.016930878,0.016150381
sID367_AAACATGCACATGCTA-1,0.25002998,-0.112220734,-0.36979952,0.027052928,-0.23896998,-0.51691395,1.0869765,1.0108525,-0.81537515,0.71203756,-0.94883174,-0.014021037,0.07627165,-1.5595407,-0.6811844,-0.051620245,1.4360468,0.37079245,0.6642489,1.3201993,0.53024554,1.7682714,1.0888612,-0.40217578,-0.3562716,-0.63303614,0.22093366,0.09114313,0.008224259,-0.0056118146
The metadata file is a csv with multiple categorical columns but also numerical columns like gene_module_scores. One of those is indicated as metadata_of_interest: ["sampleid__donor"] in the config.
Only other thing I changed compared to example config (paths of course as well): sample_proportion: 0.3 to increase iteration speed.
Thank you for your help and the amazing pipelines!
I overwrote all NaN values with a string 'unknown' and now the pipeline ran through. fyi: My NaNs are np.NaNs as included in adata.obs.
Thanks for reporting. Solved by clearer instructions in the documentation.