Pore-C-Snakemake icon indicating copy to clipboard operation
Pore-C-Snakemake copied to clipboard

Missing input files for rule all:

Open GeoMicroSoares opened this issue 3 years ago • 5 comments

Hi there @eharr , @Priyesh000 , @LynnLy ,

I'm getting the following after running snakemake --use-conda -j 10 --dry-run on my own data after my test ran successfully:

Building DAG of jobs...
MissingInputException in line 42 of /mnt/drive/Pore-C-Snakemake/Snakefile:
Missing input files for rule all:
results/basecall/DpnII_run_1.rd.catalog.yaml
results/basecall/DpnII_run_2.rd.catalog.yaml

I really can't figure out how this catalog.yaml file is created, so I'm not sure how to fix this. DpnII is listed by Biopython, so this shouldn't be an issue... Below are my config files - hopefully someone can spot something going wrong?

  • config/basecalls.tsv:
run_id enzyme refgenome_ids biospecimen fastq_path fast5_directory      sequencing_summary_path
run_1 DpnII Bs_p29 NA ../prelim_check.d/run_1712021.fastq     ../raw.d/run_1712021/all_fast5/       ../raw.d/run_1712021/sequencing_summary.txt
run_2 DpnII draft1 NA ../prelim_check.d/run_1712021.fastq     ../raw.d/run_1712021/all_fast5/       ../raw.d/run_1712021/sequencing_summary.txt
  • config/references.tsv:
refgenome_id refgenome_path
Bs_p29 ../prelim_check.d/reference.fasta
draft1 ../prelim_check.d/reference.fasta

I didn't change config/config.yaml or file_layout.yaml, just deleted phased_vcfs.tsv as I'm not using that.

Thanks in advance.

GeoMicroSoares avatar Jan 13 '22 14:01 GeoMicroSoares

Hi @GeoMicroSoares,

The pipeline is expecting the run_id and enzyme to not contain any underscores, since those are being used to delimit the wildcards in the output file names. Can you change the run_ids from run_1 and run_2 to run1 and run2?

LynnLy avatar Jan 13 '22 20:01 LynnLy

@LynnLy Could you specify how and where the run_id should be changed? Just in the folder structure?

stasys-hub avatar Jul 19 '22 10:07 stasys-hub

Hi @stasys-hub,

You shouldn't need to change the name of any existing files. The run_id that you specify in the first column of config/basecalls.tsv determines the names of the output files, and must not contain any underscores. Pore-C-Snakemake will read the file specified in the "fastq_path" column (which can be named anything) to create smaller fastq files with the naming structure: basecall/{enzyme}_{run_id}.rd.{batch_id}.fq.gz.

LynnLy avatar Jul 19 '22 16:07 LynnLy

Thank you very much, @LynnLy ! Gonna try that today.

stasys-hub avatar Jul 20 '22 05:07 stasys-hub

@LynnLy, that solved my problems! Thank you!

stasys-hub avatar Jul 21 '22 11:07 stasys-hub