Pore-C-Snakemake Missing input files for rule all:

Hi there @eharr , @Priyesh000 , @LynnLy ,

I'm getting the following after running snakemake --use-conda -j 10 --dry-run on my own data after my test ran successfully:

Building DAG of jobs...
MissingInputException in line 42 of /mnt/drive/Pore-C-Snakemake/Snakefile:
Missing input files for rule all:
results/basecall/DpnII_run_1.rd.catalog.yaml
results/basecall/DpnII_run_2.rd.catalog.yaml

I really can't figure out how this catalog.yaml file is created, so I'm not sure how to fix this. DpnII is listed by Biopython, so this shouldn't be an issue... Below are my config files - hopefully someone can spot something going wrong?

config/basecalls.tsv:

run_id enzyme refgenome_ids biospecimen fastq_path fast5_directory      sequencing_summary_path
run_1 DpnII Bs_p29 NA ../prelim_check.d/run_1712021.fastq     ../raw.d/run_1712021/all_fast5/       ../raw.d/run_1712021/sequencing_summary.txt
run_2 DpnII draft1 NA ../prelim_check.d/run_1712021.fastq     ../raw.d/run_1712021/all_fast5/       ../raw.d/run_1712021/sequencing_summary.txt

config/references.tsv:

refgenome_id refgenome_path
Bs_p29 ../prelim_check.d/reference.fasta
draft1 ../prelim_check.d/reference.fasta

I didn't change config/config.yaml or file_layout.yaml, just deleted phased_vcfs.tsv as I'm not using that.

Thanks in advance.

Jan 13 '22 14:01 GeoMicroSoares

Hi @GeoMicroSoares,

The pipeline is expecting the run_id and enzyme to not contain any underscores, since those are being used to delimit the wildcards in the output file names. Can you change the run_ids from run_1 and run_2 to run1 and run2?

Jan 13 '22 20:01 LynnLy

@LynnLy Could you specify how and where the run_id should be changed? Just in the folder structure?

Jul 19 '22 10:07 stasys-hub

Hi @stasys-hub,

You shouldn't need to change the name of any existing files. The run_id that you specify in the first column of config/basecalls.tsv determines the names of the output files, and must not contain any underscores. Pore-C-Snakemake will read the file specified in the "fastq_path" column (which can be named anything) to create smaller fastq files with the naming structure: basecall/{enzyme}_{run_id}.rd.{batch_id}.fq.gz.

Jul 19 '22 16:07 LynnLy

Thank you very much, @LynnLy ! Gonna try that today.

Jul 20 '22 05:07 stasys-hub

@LynnLy, that solved my problems! Thank you!

Jul 21 '22 11:07 stasys-hub

Pore-C-Snakemake Pore-C-Snakemake copied to clipboard

Missing input files for rule all:

Pore-C-Snakemake
Pore-C-Snakemake copied to clipboard