drop icon indicating copy to clipboard operation
drop copied to clipboard

Missing input files at dry-run

Open nacasfer opened this issue 1 year ago • 13 comments

Dear Drop-Team I am runnign into the drop dry-run (release 1.2.4) with command snakemake -n but getting following error. Config file and samplesheet are attached.

WARNING: Less than 30 IDs in DROP_GROUP sp check for missing R packages MonoallelicExpression has been turned off in the config file rnaVariantCalling has been turned off in the config file Structuring dependencies... Dependencies file generated at: /tmp/tmp6sr26_ja

Building DAG of jobs... WorkflowError: WorkflowError: WorkflowError: WorkflowError (rule AberrantExpression_pipeline_Counting_mergeCounts_R, line 135, /tmp/tmp6sr26_ja): Function did not return str or list of str. MissingInputException: Missing input files for rule markdown: output: /media/bio/datosbio2/antonio/drop_discovery/Output/html/AberrantExpression/Counting/gtf108/Summary_abexp.html wildcards: file=AberrantExpression/Counting/gtf108/Summary_abexp affected files: AberrantExpression/Counting/gtf108/Summary_abexp.md MissingInputException: Missing input files for rule markdown: output: /media/bio/datosbio2/antonio/drop_discovery/Output/html/Scripts_AberrantExpression_pipeline_Counting_Datasets.html wildcards: file=Scripts_AberrantExpression_pipeline_Counting_Datasets affected files: Scripts_AberrantExpression_pipeline_Counting_Datasets.md MissingInputException: Missing input files for rule markdown: output: /media/bio/datosbio2/antonio/drop_discovery/Output/html/aberrant-expression-pipeline_index.html wildcards: file=aberrant-expression-pipeline_index affected files: aberrant-expression-pipeline_index.md

Also, --verbose output has been attached. What I am messing up? Many thanks in advance for that awesome tool. Cheers!

config.yaml.txt sample_annotation.tsv.txt verboseOutput.txt Snakefile.txt

nacasfer avatar Jan 24 '23 13:01 nacasfer

Hi, it seems the values of the GENE_ANNOTATION column are missing for the external counts. It should be: gtf108 to match your gtf file. Have a look here: https://gagneurlab-drop.readthedocs.io/en/latest/prepare.html#external-count-examples You can then check first by running: snakemake -n sampleAnnotation

vyepez88 avatar Jan 24 '23 13:01 vyepez88

Hi Vicente, Many thanks in advance for a so fast response. I updated the samplesheet file, but i'm stil having issues with the exportCounts module snakemake -c6 exportCounts (verbose log file attached)

WARNING: Less than 30 IDs in DROP_GROUP sp check for missing R packages MonoallelicExpression has been turned off in the config file rnaVariantCalling has been turned off in the config file Structuring dependencies... Dependencies file generated at: /tmp/tmpo5jhoczo

Building DAG of jobs... WorkflowError in file /tmp/tmpo5jhoczo, line 51: Function did not return str or list of str.

Also, there is the output of the command snakemake -n sampleAnnotation , which is, apparently, working well now.

WARNING: Less than 30 IDs in DROP_GROUP sp check for missing R packages MonoallelicExpression has been turned off in the config file rnaVariantCalling has been turned off in the config file Structuring dependencies... Dependencies file generated at: /tmp/tmpx0_z_0op

Building DAG of jobs... Job stats: job count min threads max threads


Pipeline_SampleAnnotation_R 1 1 1 sampleAnnotation 1 1 1 total 2 1 1

[Tue Jan 24 15:27:21 2023] rule Pipeline_SampleAnnotation_R: input: /media/bio/datosbio2/antonio/drop_discovery/sample_annotation.tsv, Scripts/Pipeline/SampleAnnotation.R output: /media/bio/datosbio2/antonio/drop_discovery/Output/processed_data/sample_anno/genes_overlapping_HPO_terms.tsv, /media/bio/datosbio2/antonio/drop_discovery/Output/html/Scripts_Pipeline_SampleAnnotation.html log: /media/bio/datosbio2/antonio/drop_discovery/.drop/tmp/SampleAnnotation.Rds jobid: 1 reason: Missing output files: /media/bio/datosbio2/antonio/drop_discovery/Output/processed_data/sample_anno/genes_overlapping_HPO_terms.tsv, /media/bio/datosbio2/antonio/drop_discovery/Output/html/Scripts_Pipeline_SampleAnnotation.html resources: tmpdir=/tmp

[Tue Jan 24 15:27:21 2023] localrule sampleAnnotation: input: /media/bio/datosbio2/antonio/drop_discovery/Output/processed_data/sample_anno/genes_overlapping_HPO_terms.tsv, /media/bio/datosbio2/antonio/drop_discovery/Output/html/Scripts_Pipeline_SampleAnnotation.html jobid: 0 reason: Input files updated by another job: /media/bio/datosbio2/antonio/drop_discovery/Output/processed_data/sample_anno/genes_overlapping_HPO_terms.tsv, /media/bio/datosbio2/antonio/drop_discovery/Output/html/Scripts_Pipeline_SampleAnnotation.html resources: tmpdir=/tmp

Job stats: job count min threads max threads


Pipeline_SampleAnnotation_R 1 1 1 sampleAnnotation 1 1 1 total 2 1 1

Reasons: (check individual jobs above for details) input files updated by another job: sampleAnnotation missing output files: Pipeline_SampleAnnotation_R

This was a dry-run (flag -n). The order of jobs does not reflect the order of execution. 2023-01-24T153510.757027.snakemake.log

Again, many thanks in advance Best!

nacasfer avatar Jan 24 '23 14:01 nacasfer

Hi, the error seems to be in the mergeCounts_R rule. A potential reason is that the gene annotation used to generate the external count matrix is different from your provided gtf file. Can you verify this? One way of doing that is loading one of your counted samples under {root}/processed_data/aberrant_expression/gtf108/counts and compare the rownames with the ones from your provided count matrix (geneCounts_60C.tsv.gz)

vyepez88 avatar Jan 24 '23 15:01 vyepez88

Hi!, I am affraid the only folder inside {root}/processed_data/aberrant_expression/gtf108 is params/ ;no counts/ folder at all. Is that a clue for you? There is a {root}/processed_data/aberrant_expression/gtf108/params/counts/ folder. I am attaching one of these files but they seems ok to me.

Anyway, I only use release 108 as gtf file (I checked that the matrix counts and gtf have same labels, just in case some issue have done happened) 40623_countParams.csv I've tried removing the samples with the external count matrix (C1 to N45) from the sample annotation file, but the error is still the same... so I guess the error must be on the way that bam files are feeding that script.... but I am completely stuck at this point

Thanks again for your kindly help

nacasfer avatar Jan 24 '23 15:01 nacasfer

oh true, that folder will only be populated after you count. Check that all BAM files exist and they have a corresponding index file (.bai). If they all exist, execute snakemake --cores X sampleAnnotation? An html file will be generated. Check that the DROP groups contain the number of samples they should have in the histogram at the bottom of the html.

vyepez88 avatar Jan 24 '23 16:01 vyepez88

Hi Vicente! Everithing is ok with the sampleAnnotation output. BAM and VCF files are correctly detected, while histogram-groups are correctly conformed. Having bam files on the same folder as drop init command has been used is mandatory? I mean, I used drop init at the folder set at {root} on the config file, while all the BAM files are outside that (path have set on the sampleannotation file.)

nacasfer avatar Jan 25 '23 07:01 nacasfer

Hi, sorry, I don't understand whether you have executed the pipeline partially or not. Can you execute:

snakemake -n aberrantExpression

vyepez88 avatar Jan 25 '23 16:01 vyepez88

Hi, sure! snakemake -n aberrantExpression

WARNING: Less than 30 IDs in DROP_GROUP sp check for missing R packages MonoallelicExpression has been turned off in the config file rnaVariantCalling has been turned off in the config file Structuring dependencies... Dependencies file generated at: /tmp/tmpp606shop

Building DAG of jobs... WorkflowError: WorkflowError: MissingInputException: Missing input files for rule markdown: output: /media/bio/datosbio2/antonio/drop_discovery/Output/html/AberrantExpression/Counting/gtf108/Summary_abexp.html wildcards: file=AberrantExpression/Counting/gtf108/Summary_abexp affected files: AberrantExpression/Counting/gtf108/Summary_abexp.md WorkflowError (rule AberrantExpression_pipeline_Counting_mergeCounts_R, line 135, /tmp/tmpp606shop): Function did not return str or list of str. MissingInputException: Missing input files for rule markdown: output: /media/bio/datosbio2/antonio/drop_discovery/Output/html/Scripts_AberrantExpression_pipeline_Counting_Datasets.html wildcards: file=Scripts_AberrantExpression_pipeline_Counting_Datasets affected files: Scripts_AberrantExpression_pipeline_Counting_Datasets.md

nacasfer avatar Jan 26 '23 06:01 nacasfer

Hi @nacasfer, did you solve this issue? I'm dealing with something similar.

Regards.

geocarvalho avatar Nov 06 '23 23:11 geocarvalho

Hi, so sorry this slipped. Can you please share your sample annotation file?

vyepez88 avatar Nov 07 '23 08:11 vyepez88

drop_test.zip Hi @vyepez88, can you check it out, please?

geocarvalho avatar Nov 14 '23 18:11 geocarvalho

Can you share your sample annotation file as well?

vyepez88 avatar Nov 22 '23 16:11 vyepez88

Sorry! drop_test.zip

geocarvalho avatar Nov 22 '23 18:11 geocarvalho