website icon indicating copy to clipboard operation
website copied to clipboard

Create a global parameters list

Open christopher-hakkaart opened this issue 2 years ago • 3 comments

Create a global parameters list

Terminology between pipelines and shared assets can differ. To help preserve shared content and familiarity between pipelines, subworkflows and modules, it would be beneficial to create a reserved ontology. For example, parameter names such as --bwa_index and --bwa should be reserved.

A reserved ontology list needs to be created. There might be example from elsewhere we could use to start. We could also scrape all JSON schema files and build a big list (link to it in the writing pipelines tutorial). Final product could be a list with clear descriptions that can be used by developers to guide naming conventions.

christopher-hakkaart avatar Sep 23 '22 13:09 christopher-hakkaart

A secondary objective will be to reserve global samplesheet headers

christopher-hakkaart avatar Sep 23 '22 13:09 christopher-hakkaart

I did a really quick and dirty scrape of the schema.json from the pipelines listed as released on the website. 3 of those did not appear to have a schema.json in master and got skipped: mnaseseq, imcyto, slamseq. This leaves 44 pipelines.

Here are all the params which appeared in more than one pipeline:

outdir	44
email	44
custom_config_version	44
custom_config_base	44
config_profile_description	44
config_profile_contact	44
config_profile_url	44
max_cpus	44
max_memory	44
max_time	44
help	44
email_on_fail	44
plaintext_email	44
monochrome_logs	44
tracedir	43
input	42
publish_dir_mode	42
max_multiqc_email_size	40
validate_params	40
show_hidden_params	40
config_profile_name	39
multiqc_config	38
multiqc_title	33
hook_url	29
igenomes_ignore	28
igenomes_base	27
genome	25
version	24
multiqc_logo	24
multiqc_methods_description	24
fasta	23
skip_multiqc	17
save_reference	15
aligner	14
gtf	13
enable_conda	13
hostnames	12
skip_fastqc	12
clip_r1	10
three_prime_clip_r1	10
clip_r2	9
three_prime_clip_r2	9
save_trimmed	9
gff	8
skip_trimming	8
trim_nextseq	7
star_index	7
seq_center	7
skip_qc	7
protocol	6
save_unaligned	6
save_align_intermeds	6
gene_bed	6
bwa_index	6
skip_preseq	5
single_end	5
enzyme	4
trim_fastq	4
star_ignore_sjdbgtf	4
read_length	4
skip_alignment	4
save_merged_fastq	4
blacklist	4
skip_igv	4
skip_peak_qc	4
macs_gsize	4
name	4
database	3
decoy_method	3
precursor_mass_tolerance	3
fragment_mass_tolerance	3
fixed_mods	3
variable_mods	3
min_peptide_length	3
max_peptide_length	3
num_hits	3
subset_max_train	3
klammer	3
description_correct_features	3
quantification_method	3
contrasts	3
singularity_pull_docker_container	3
bowtie2_index	3
save_trimmed_fail	3
skip_cutadapt	3
skip_markduplicates	3
skip_picard_metrics	3
seq_platform	3
keep_dups	3
deseq2_vst	3
skip_deseq2_qc	3
skip_peak_annotation	3
skip_plot_profile	3
root_folder	2
local_input_type	2
add_decoys	2
openms_peakpicking	2
peakpicking_inmemory	2
peakpicking_ms_levels	2
search_engines	2
num_enzyme_termini	2
allowed_missed_cleavages	2
precursor_mass_tolerance_unit	2
fragment_mass_tolerance_unit	2
fragment_method	2
isotope_error_range	2
instrument	2
min_precursor_charge	2
max_precursor_charge	2
max_mods	2
db_debug	2
enable_mod_localization	2
mod_localization	2
luciphor_neutral_losses	2
luciphor_decoy_mass	2
luciphor_decoy_neutral_losses	2
luciphor_debug	2
IL_equivalent	2
posterior_probabilities	2
pp_debug	2
FDR_level	2
train_FDR	2
test_FDR	2
outlier_handling	2
consensusid_algorithm	2
consensusid_considered_top_hits	2
min_consensus_support	2
protein_level_fdr_cutoff	2
protein_quant	2
mass_recalibration	2
transfer_ids	2
targeted_only	2
skip_post_msstats	2
ref_condition	2
enable_qc	2
ptxqc_report_layout	2
skip_pycoqc	2
skip_nanoplot	2
kraken2_db	2
skip_kraken2	2
skip_fastp	2
variant_caller	2
min_mapped_reads	2
mode	2
adapter_fasta	2
save_databases	2
transcript_fasta	2
salmon_index	2
tools	2
trim	2
fai	2
malt_mode	2
stranded	2
skip_quantification	2
skip_bigwig	2
peakcaller	2
annotation_tool	2
with_umi	2
umitools_dedup_stats	2
dragmap	2
skip_tools	2
split_fastq	2
no_intervals	2
snpeff_cache	2
vep_cache	2
dbsnp	2
dbsnp_tbi	2
dict	2
fasta_fai	2
known_indels	2
known_indels_tbi	2
mappability	2
snpeff_db	2
vep_genome	2
vep_species	2
vep_cache_version	2
remove_ribo_rna	2
ribo_database_manifest	2
save_non_ribo_reads	2
bam_csi_index	2
skip_qualimap	2
fasta_index	2
skip_deduplication	2
skip_decoy_generation	2
fragment_size	2
chromap_index	2
keep_multi_map	2
bwa_min_score	2
bamtools_filter_pe_config	2
bamtools_filter_se_config	2
narrow_peak	2
broad_cutoff	2
macs_fdr	2
macs_pvalue	2
min_reps_consensus	2
save_macs_pileup	2
skip_consensus_peaks	2
skip_plot_fingerprint	2
fingerprint_bins	2
krakendb	2
bowtie_index	2
ncrna	2

There are over 2000 params that appear in only a single pipeline and I am not sure how many of those might be similarly named but not identical and should perhaps be standardised?

awgymer avatar Mar 27 '23 11:03 awgymer

On top of that, I'd like a global meta.map fields

maxulysse avatar Nov 07 '23 14:11 maxulysse