website
website copied to clipboard
Create a global parameters list
Create a global parameters list
Terminology between pipelines and shared assets can differ. To help preserve shared content and familiarity between pipelines, subworkflows and modules, it would be beneficial to create a reserved ontology. For example, parameter names such as --bwa_index
and --bwa
should be reserved.
A reserved ontology list needs to be created. There might be example from elsewhere we could use to start. We could also scrape all JSON schema files and build a big list (link to it in the writing pipelines tutorial). Final product could be a list with clear descriptions that can be used by developers to guide naming conventions.
A secondary objective will be to reserve global samplesheet headers
I did a really quick and dirty scrape of the schema.json
from the pipelines listed as released
on the website. 3 of those did not appear to have a schema.json
in master
and got skipped: mnaseseq
, imcyto
, slamseq
. This leaves 44
pipelines.
Here are all the params which appeared in more than one pipeline:
outdir 44
email 44
custom_config_version 44
custom_config_base 44
config_profile_description 44
config_profile_contact 44
config_profile_url 44
max_cpus 44
max_memory 44
max_time 44
help 44
email_on_fail 44
plaintext_email 44
monochrome_logs 44
tracedir 43
input 42
publish_dir_mode 42
max_multiqc_email_size 40
validate_params 40
show_hidden_params 40
config_profile_name 39
multiqc_config 38
multiqc_title 33
hook_url 29
igenomes_ignore 28
igenomes_base 27
genome 25
version 24
multiqc_logo 24
multiqc_methods_description 24
fasta 23
skip_multiqc 17
save_reference 15
aligner 14
gtf 13
enable_conda 13
hostnames 12
skip_fastqc 12
clip_r1 10
three_prime_clip_r1 10
clip_r2 9
three_prime_clip_r2 9
save_trimmed 9
gff 8
skip_trimming 8
trim_nextseq 7
star_index 7
seq_center 7
skip_qc 7
protocol 6
save_unaligned 6
save_align_intermeds 6
gene_bed 6
bwa_index 6
skip_preseq 5
single_end 5
enzyme 4
trim_fastq 4
star_ignore_sjdbgtf 4
read_length 4
skip_alignment 4
save_merged_fastq 4
blacklist 4
skip_igv 4
skip_peak_qc 4
macs_gsize 4
name 4
database 3
decoy_method 3
precursor_mass_tolerance 3
fragment_mass_tolerance 3
fixed_mods 3
variable_mods 3
min_peptide_length 3
max_peptide_length 3
num_hits 3
subset_max_train 3
klammer 3
description_correct_features 3
quantification_method 3
contrasts 3
singularity_pull_docker_container 3
bowtie2_index 3
save_trimmed_fail 3
skip_cutadapt 3
skip_markduplicates 3
skip_picard_metrics 3
seq_platform 3
keep_dups 3
deseq2_vst 3
skip_deseq2_qc 3
skip_peak_annotation 3
skip_plot_profile 3
root_folder 2
local_input_type 2
add_decoys 2
openms_peakpicking 2
peakpicking_inmemory 2
peakpicking_ms_levels 2
search_engines 2
num_enzyme_termini 2
allowed_missed_cleavages 2
precursor_mass_tolerance_unit 2
fragment_mass_tolerance_unit 2
fragment_method 2
isotope_error_range 2
instrument 2
min_precursor_charge 2
max_precursor_charge 2
max_mods 2
db_debug 2
enable_mod_localization 2
mod_localization 2
luciphor_neutral_losses 2
luciphor_decoy_mass 2
luciphor_decoy_neutral_losses 2
luciphor_debug 2
IL_equivalent 2
posterior_probabilities 2
pp_debug 2
FDR_level 2
train_FDR 2
test_FDR 2
outlier_handling 2
consensusid_algorithm 2
consensusid_considered_top_hits 2
min_consensus_support 2
protein_level_fdr_cutoff 2
protein_quant 2
mass_recalibration 2
transfer_ids 2
targeted_only 2
skip_post_msstats 2
ref_condition 2
enable_qc 2
ptxqc_report_layout 2
skip_pycoqc 2
skip_nanoplot 2
kraken2_db 2
skip_kraken2 2
skip_fastp 2
variant_caller 2
min_mapped_reads 2
mode 2
adapter_fasta 2
save_databases 2
transcript_fasta 2
salmon_index 2
tools 2
trim 2
fai 2
malt_mode 2
stranded 2
skip_quantification 2
skip_bigwig 2
peakcaller 2
annotation_tool 2
with_umi 2
umitools_dedup_stats 2
dragmap 2
skip_tools 2
split_fastq 2
no_intervals 2
snpeff_cache 2
vep_cache 2
dbsnp 2
dbsnp_tbi 2
dict 2
fasta_fai 2
known_indels 2
known_indels_tbi 2
mappability 2
snpeff_db 2
vep_genome 2
vep_species 2
vep_cache_version 2
remove_ribo_rna 2
ribo_database_manifest 2
save_non_ribo_reads 2
bam_csi_index 2
skip_qualimap 2
fasta_index 2
skip_deduplication 2
skip_decoy_generation 2
fragment_size 2
chromap_index 2
keep_multi_map 2
bwa_min_score 2
bamtools_filter_pe_config 2
bamtools_filter_se_config 2
narrow_peak 2
broad_cutoff 2
macs_fdr 2
macs_pvalue 2
min_reps_consensus 2
save_macs_pileup 2
skip_consensus_peaks 2
skip_plot_fingerprint 2
fingerprint_bins 2
krakendb 2
bowtie_index 2
ncrna 2
There are over 2000 params that appear in only a single pipeline and I am not sure how many of those might be similarly named but not identical and should perhaps be standardised?
On top of that, I'd like a global meta.map fields