Pore-C-Snakemake
Pore-C-Snakemake copied to clipboard
Missing input files for rule all:
Hi there @eharr , @Priyesh000 , @LynnLy ,
I'm getting the following after running snakemake --use-conda -j 10 --dry-run
on my own data after my test ran successfully:
Building DAG of jobs...
MissingInputException in line 42 of /mnt/drive/Pore-C-Snakemake/Snakefile:
Missing input files for rule all:
results/basecall/DpnII_run_1.rd.catalog.yaml
results/basecall/DpnII_run_2.rd.catalog.yaml
I really can't figure out how this catalog.yaml
file is created, so I'm not sure how to fix this. DpnII
is listed by Biopython, so this shouldn't be an issue... Below are my config files - hopefully someone can spot something going wrong?
-
config/basecalls.tsv
:
run_id enzyme refgenome_ids biospecimen fastq_path fast5_directory sequencing_summary_path
run_1 DpnII Bs_p29 NA ../prelim_check.d/run_1712021.fastq ../raw.d/run_1712021/all_fast5/ ../raw.d/run_1712021/sequencing_summary.txt
run_2 DpnII draft1 NA ../prelim_check.d/run_1712021.fastq ../raw.d/run_1712021/all_fast5/ ../raw.d/run_1712021/sequencing_summary.txt
-
config/references.tsv
:
refgenome_id refgenome_path
Bs_p29 ../prelim_check.d/reference.fasta
draft1 ../prelim_check.d/reference.fasta
I didn't change config/config.yaml
or file_layout.yaml
, just deleted phased_vcfs.tsv
as I'm not using that.
Thanks in advance.
Hi @GeoMicroSoares,
The pipeline is expecting the run_id and enzyme to not contain any underscores, since those are being used to delimit the wildcards in the output file names. Can you change the run_ids from run_1 and run_2 to run1 and run2?
@LynnLy Could you specify how and where the run_id should be changed? Just in the folder structure?
Hi @stasys-hub,
You shouldn't need to change the name of any existing files. The run_id that you specify in the first column of config/basecalls.tsv
determines the names of the output files, and must not contain any underscores. Pore-C-Snakemake will read the file specified in the "fastq_path" column (which can be named anything) to create smaller fastq files with the naming structure: basecall/{enzyme}_{run_id}.rd.{batch_id}.fq.gz
.
Thank you very much, @LynnLy ! Gonna try that today.
@LynnLy, that solved my problems! Thank you!