mgatk icon indicating copy to clipboard operation
mgatk copied to clipboard

Error about Missing output files

Open ttab963 opened this issue 1 year ago • 3 comments

Hi, team. I tried to use MAESTER(https://github.com/petervangalen/MAESTER-2021), and maegatk first. But in my own data or test data cannot be processed, so i tried to mgatk test file. Then i figured out this problem is also happed in mgatk.

I ran this command,

mgatk bcall -i barcode/test_barcode.bam -n bc1 -o bc1d -bt CB -b barcode/test_barcodes.txt -z 

Error comes out after final_sparse_matrices part.

Thu Jul 07 17:56:35 KST 2022: mgatk v0.6.6
Thu Jul 07 17:56:35 KST 2022: Found bam file: barcode/test_barcode.bam for genotyping.
Thu Jul 07 17:56:35 KST 2022: Found file of barcodes to be parsed: barcode/test_barcodes.txt
Thu Jul 07 17:56:35 KST 2022: User specified mitochondrial genome matches .bam file
Thu Jul 07 17:56:37 KST 2022: Finished determining/splitting barcodes for genotyping.
Thu Jul 07 17:56:38 KST 2022: Genotyping samples with 88 threads
Config file bc1d/.internal/parseltongue/snake.gather.yaml is extended by additional config specified via the command line.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job                           count    min threads    max threads
--------------------------  -------  -------------  -------------
all                               1              1              1
make_depth_table                  1              1              1
make_final_sparse_matrices        1              1              1
total                             3              1              1

Select jobs to execute...

[Thu Jul  7 17:56:38 2022]
rule make_final_sparse_matrices:
    output: bc1d/final/bc1.A.txt.gz, bc1d/final/bc1.C.txt.gz, bc1d/final/bc1.G.txt.gz, bc1d/final/bc1.T.txt.gz, bc1d/final/bc1.coverage.txt.gz
    jobid: 2
    reason: Missing output files: bc1d/final/bc1.A.txt.gz, bc1d/final/bc1.G.txt.gz, bc1d/final/bc1.coverage.txt.gz, bc1d/final/bc1.C.txt.gz, bc1d/final/bc1.T.txt.gz
    resources: tmpdir=/tmp

Config file bc1d/.internal/parseltongue/snake.scatter.yaml is extended by additional config specified via the command line.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 88
Rules claiming more threads will be scaled down.
Job stats:
job                   count    min threads    max threads
------------------  -------  -------------  -------------
all                       1              1              1
make_sample_list          1              1              1
process_one_sample        3              1              1
total                     5              1              1

Select jobs to execute...

[Thu Jul  7 17:56:39 2022]
rule process_one_sample:
    input: bc1d/.internal/samples/CACCACTAGGAGGCGA-1.bam.txt
    output: bc1d/temp/ready_bam/CACCACTAGGAGGCGA-1.qc.bam, bc1d/temp/ready_bam/CACCACTAGGAGGCGA-1.qc.bam.bai, bc1d/qc/depth/CACCACTAGGAGGCGA-1.depth.txt, bc1d/temp/sparse_matrices/CACCACTAGGAGGCGA-1.A.txt, bc1d/temp/sparse_matrices/CACCACTAGGAGGCGA-1.C.txt, bc1d/temp/sparse_matrices/CACCACTAGGAGGCGA-1.G.txt, bc1d/temp/sparse_matrices/CACCACTAGGAGGCGA-1.T.txt, bc1d/temp/sparse_matrices/CACCACTAGGAGGCGA-1.coverage.txt
    jobid: 3
    reason: Missing output files: bc1d/qc/depth/CACCACTAGGAGGCGA-1.depth.txt
    wildcards: sample=CACCACTAGGAGGCGA-1
    resources: tmpdir=/tmp

[Thu Jul  7 17:56:39 2022]
rule process_one_sample:
    input: bc1d/.internal/samples/GCCTAGGCAGTTCGGC-1.bam.txt
    output: bc1d/temp/ready_bam/GCCTAGGCAGTTCGGC-1.qc.bam, bc1d/temp/ready_bam/GCCTAGGCAGTTCGGC-1.qc.bam.bai, bc1d/qc/depth/GCCTAGGCAGTTCGGC-1.depth.txt, bc1d/temp/sparse_matrices/GCCTAGGCAGTTCGGC-1.A.txt, bc1d/temp/sparse_matrices/GCCTAGGCAGTTCGGC-1.C.txt, bc1d/temp/sparse_matrices/GCCTAGGCAGTTCGGC-1.G.txt, bc1d/temp/sparse_matrices/GCCTAGGCAGTTCGGC-1.T.txt, bc1d/temp/sparse_matrices/GCCTAGGCAGTTCGGC-1.coverage.txt
    jobid: 4
    reason: Missing output files: bc1d/qc/depth/GCCTAGGCAGTTCGGC-1.depth.txt
    wildcards: sample=GCCTAGGCAGTTCGGC-1
    resources: tmpdir=/tmp

[Thu Jul  7 17:56:39 2022]
rule process_one_sample:
    input: bc1d/.internal/samples/CTAACTTAGAGCCACA-1.bam.txt
    output: bc1d/temp/ready_bam/CTAACTTAGAGCCACA-1.qc.bam, bc1d/temp/ready_bam/CTAACTTAGAGCCACA-1.qc.bam.bai, bc1d/qc/depth/CTAACTTAGAGCCACA-1.depth.txt, bc1d/temp/sparse_matrices/CTAACTTAGAGCCACA-1.A.txt, bc1d/temp/sparse_matrices/CTAACTTAGAGCCACA-1.C.txt, bc1d/temp/sparse_matrices/CTAACTTAGAGCCACA-1.G.txt, bc1d/temp/sparse_matrices/CTAACTTAGAGCCACA-1.T.txt, bc1d/temp/sparse_matrices/CTAACTTAGAGCCACA-1.coverage.txt
    jobid: 2
    reason: Missing output files: bc1d/qc/depth/CTAACTTAGAGCCACA-1.depth.txt
    wildcards: sample=CTAACTTAGAGCCACA-1
    resources: tmpdir=/tmp

gzip: bc1d/final/bc1.A.txt: No such file or directory
gzip: bc1d/final/bc1.C.txt: No such file or directory
gzip: bc1d/final/bc1.G.txt: No such file or directory
gzip: bc1d/final/bc1.T.txt: No such file or directory
gzip: bc1d/final/bc1.coverage.txt: No such file or directory
Error in checkGrep(grep(".A.txt", files)) :
  Improper folder specification; file missing / extra file present. See documentation
Calls: importMito -> checkGrep
Execution halted
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2022-07-07T175638.678416.snakemake.log
[Thu Jul  7 17:56:47 2022]
Finished job 4.
1 of 5 steps (20%) done
[Thu Jul  7 17:56:48 2022]
Finished job 2.
2 of 5 steps (40%) done
[Thu Jul  7 17:56:48 2022]
Finished job 3.
3 of 5 steps (60%) done
Select jobs to execute...

[Thu Jul  7 17:56:48 2022]
rule make_sample_list:
    input: bc1d/qc/depth/CTAACTTAGAGCCACA-1.depth.txt, bc1d/qc/depth/CACCACTAGGAGGCGA-1.depth.txt, bc1d/qc/depth/GCCTAGGCAGTTCGGC-1.depth.txt
    output: bc1d/temp/scattered.allSamples.txt
    jobid: 1
    reason: Missing output files: bc1d/temp/scattered.allSamples.txt; Input files updated by another job: bc1d/qc/depth/CACCACTAGGAGGCGA-1.depth.txt, bc1d/qc/depth/GCCTAGGCAGTTCGGC-1.depth.txt, bc1d/qc/depth/CTAACTTAGAGCCACA-1.depth.txt
    resources: tmpdir=/tmp

[Thu Jul  7 17:56:49 2022]
Finished job 1.
4 of 5 steps (80%) done
Select jobs to execute...

[Thu Jul  7 17:56:49 2022]
localrule all:
    input: bc1d/temp/scattered.allSamples.txt
    jobid: 0
    reason: Input files updated by another job: bc1d/temp/scattered.allSamples.txt
    resources: tmpdir=/tmp

[Thu Jul  7 17:56:49 2022]
Finished job 0.
5 of 5 steps (100%) done
Complete log: .snakemake/log/2022-07-07T175638.766786.snakemake.log

and

These are the list of output files.

bc1d/:
total 20K
drwxrwxr-x 2 sjp sjp 4.0K Jul  7  2022 fasta
drwxrwxr-x 2 sjp sjp 4.0K Jul  7  2022 final
drwxrwxr-x 4 sjp sjp 4.0K Jul  7  2022 logs
drwxrwxr-x 4 sjp sjp 4.0K Jul  7  2022 qc
drwxrwxr-x 7 sjp sjp 4.0K Jul  7  2022 temp

bc1d/fasta:
total 24K
-rw-rw-r-- 1 sjp sjp 17K Jul  7  2022 chrM.fasta
-rw-rw-r-- 1 sjp sjp  19 Jul  7  2022 chrM.fasta.fai

bc1d/final:
total 120K
-rw-rw-r-- 1 sjp sjp 119K Jul  7  2022 chrM_refAllele.txt

bc1d/logs:
total 28K
-rw-rw-r-- 1 sjp sjp  433 Jul  7  2022 base.mgatk.log
-rw-rw-r-- 1 sjp sjp  476 Jul  7  2022 bc1.parameters.txt
-rw-rw-r-- 1 sjp sjp    0 Jul  7  2022 bc1.snakemake_gather.log
-rw-rw-r-- 1 sjp sjp    0 Jul  7  2022 bc1.snakemake_scatter.log
-rw-rw-r-- 1 sjp sjp  10K Jul  7  2022 bc1.snakemake_scatter.stats
drwxrwxr-x 2 sjp sjp 4.0K Jul  7  2022 filterlogs
drwxrwxr-x 2 sjp sjp 4.0K Jul  7  2022 rmdupslogs

bc1d/logs/filterlogs:
total 12K
-rw-rw-r-- 1 sjp sjp 21 Jul  7  2022 CACCACTAGGAGGCGA-1.filter.log
-rw-rw-r-- 1 sjp sjp 21 Jul  7  2022 CTAACTTAGAGCCACA-1.filter.log
-rw-rw-r-- 1 sjp sjp 21 Jul  7  2022 GCCTAGGCAGTTCGGC-1.filter.log

bc1d/logs/rmdupslogs:
total 12K
-rw-rw-r-- 1 sjp sjp 1.5K Jul  7  2022 CACCACTAGGAGGCGA-1.rmdups.log
-rw-rw-r-- 1 sjp sjp 1.5K Jul  7  2022 CTAACTTAGAGCCACA-1.rmdups.log
-rw-rw-r-- 1 sjp sjp 1.5K Jul  7  2022 GCCTAGGCAGTTCGGC-1.rmdups.log

bc1d/qc:
total 8.0K
drwxrwxr-x 2 sjp sjp 4.0K Jul  7  2022 depth
drwxrwxr-x 2 sjp sjp 4.0K Jul  7  2022 quality

bc1d/qc/depth:
total 12K
-rw-rw-r-- 1 sjp sjp 25 Jul  7  2022 CACCACTAGGAGGCGA-1.depth.txt
-rw-rw-r-- 1 sjp sjp 25 Jul  7  2022 CTAACTTAGAGCCACA-1.depth.txt
-rw-rw-r-- 1 sjp sjp 26 Jul  7  2022 GCCTAGGCAGTTCGGC-1.depth.txt

bc1d/qc/quality:
total 0

bc1d/temp:
total 24K
drwxrwxr-x 2 sjp sjp 4.0K Jul  7  2022 barcoded_bams
drwxrwxr-x 2 sjp sjp 4.0K Jul  7  2022 quality
drwxrwxr-x 2 sjp sjp 4.0K Jul  7  2022 ready_bam
-rw-rw-r-- 1 sjp sjp   57 Jul  7  2022 scattered.allSamples.txt
drwxrwxr-x 2 sjp sjp 4.0K Jul  7  2022 sparse_matrices
drwxrwxr-x 2 sjp sjp 4.0K Jul  7  2022 temp_bam

bc1d/temp/barcoded_bams:
total 7.0M
-rw-rw-r-- 1 sjp sjp 2.7M Jul  7  2022 CACCACTAGGAGGCGA-1.bam
-rw-rw-r-- 1 sjp sjp  808 Jul  7  2022 CACCACTAGGAGGCGA-1.bam.bai
-rw-rw-r-- 1 sjp sjp 2.4M Jul  7  2022 CTAACTTAGAGCCACA-1.bam
-rw-rw-r-- 1 sjp sjp  808 Jul  7  2022 CTAACTTAGAGCCACA-1.bam.bai
-rw-rw-r-- 1 sjp sjp 2.1M Jul  7  2022 GCCTAGGCAGTTCGGC-1.bam
-rw-rw-r-- 1 sjp sjp  792 Jul  7  2022 GCCTAGGCAGTTCGGC-1.bam.bai

bc1d/temp/quality:
total 0

bc1d/temp/ready_bam:
total 7.4M
-rw-rw-r-- 1 sjp sjp 2.8M Jul  7  2022 CACCACTAGGAGGCGA-1.qc.bam
-rw-rw-r-- 1 sjp sjp  808 Jul  7  2022 CACCACTAGGAGGCGA-1.qc.bam.bai
-rw-rw-r-- 1 sjp sjp 2.5M Jul  7  2022 CTAACTTAGAGCCACA-1.qc.bam
-rw-rw-r-- 1 sjp sjp  824 Jul  7  2022 CTAACTTAGAGCCACA-1.qc.bam.bai
-rw-rw-r-- 1 sjp sjp 2.1M Jul  7  2022 GCCTAGGCAGTTCGGC-1.qc.bam
-rw-rw-r-- 1 sjp sjp  808 Jul  7  2022 GCCTAGGCAGTTCGGC-1.qc.bam.bai

bc1d/temp/sparse_matrices:
total 3.2M
-rw-rw-r-- 1 sjp sjp 193K Jul  7  2022 CACCACTAGGAGGCGA-1.A.txt
-rw-rw-r-- 1 sjp sjp 458K Jul  7  2022 CACCACTAGGAGGCGA-1.coverage.txt
-rw-rw-r-- 1 sjp sjp 189K Jul  7  2022 CACCACTAGGAGGCGA-1.C.txt
-rw-rw-r-- 1 sjp sjp  99K Jul  7  2022 CACCACTAGGAGGCGA-1.G.txt
-rw-rw-r-- 1 sjp sjp 149K Jul  7  2022 CACCACTAGGAGGCGA-1.T.txt
-rw-rw-r-- 1 sjp sjp 183K Jul  7  2022 CTAACTTAGAGCCACA-1.A.txt
-rw-rw-r-- 1 sjp sjp 457K Jul  7  2022 CTAACTTAGAGCCACA-1.coverage.txt
-rw-rw-r-- 1 sjp sjp 183K Jul  7  2022 CTAACTTAGAGCCACA-1.C.txt
-rw-rw-r-- 1 sjp sjp  94K Jul  7  2022 CTAACTTAGAGCCACA-1.G.txt
-rw-rw-r-- 1 sjp sjp 145K Jul  7  2022 CTAACTTAGAGCCACA-1.T.txt
-rw-rw-r-- 1 sjp sjp 185K Jul  7  2022 GCCTAGGCAGTTCGGC-1.A.txt
-rw-rw-r-- 1 sjp sjp 452K Jul  7  2022 GCCTAGGCAGTTCGGC-1.coverage.txt
-rw-rw-r-- 1 sjp sjp 183K Jul  7  2022 GCCTAGGCAGTTCGGC-1.C.txt
-rw-rw-r-- 1 sjp sjp  93K Jul  7  2022 GCCTAGGCAGTTCGGC-1.G.txt
-rw-rw-r-- 1 sjp sjp 144K Jul  7  2022 GCCTAGGCAGTTCGGC-1.T.txt

bc1d/temp/temp_bam:
total 14M
-rw-rw-r-- 1 sjp sjp 2.7M Jul  7  2022 CACCACTAGGAGGCGA-1.temp0.bam
-rw-rw-r-- 1 sjp sjp 2.7M Jul  7  2022 CACCACTAGGAGGCGA-1.temp1.bam
-rw-rw-r-- 1 sjp sjp  808 Jul  7  2022 CACCACTAGGAGGCGA-1.temp1.bam.bai
-rw-rw-r-- 1 sjp sjp 2.4M Jul  7  2022 CTAACTTAGAGCCACA-1.temp0.bam
-rw-rw-r-- 1 sjp sjp 2.4M Jul  7  2022 CTAACTTAGAGCCACA-1.temp1.bam
-rw-rw-r-- 1 sjp sjp  808 Jul  7  2022 CTAACTTAGAGCCACA-1.temp1.bam.bai
-rw-rw-r-- 1 sjp sjp 2.1M Jul  7  2022 GCCTAGGCAGTTCGGC-1.temp0.bam
-rw-rw-r-- 1 sjp sjp 2.1M Jul  7  2022 GCCTAGGCAGTTCGGC-1.temp1.bam
-rw-rw-r-- 1 sjp sjp  792 Jul  7  2022 GCCTAGGCAGTTCGGC-1.temp1.bam.bai

I really appreciate your kindness. Thanks.

ttab963 avatar Jul 07 '22 09:07 ttab963

Hm strange... it's erroring out at not finding a 'depth' file per barcode; can you see if this command works?

mgatk bcall -i barcode/test_barcode.bam -n bc1_tenx -o tenx_test -bt CB -b barcode/test_barcodes.txt -z 

caleblareau avatar Jul 08 '22 18:07 caleblareau

Thank you for your reply. Unfortunately it shows same error.

But interesting point is if i run again with -z option to the same directory, it runs well! I think some problem in snakemake cause the problem to find file or something. I test this on snakemake version 7.7.0 & 7.8.5.

And plus, can I use this tools to 10x single-end (3' or 5') scRNA seq data? In this case filtering method that used in this tool(strand bias) have to be changed, but i just not sure I can or not.

ttab963 avatar Jul 15 '22 02:07 ttab963