drop icon indicating copy to clipboard operation
drop copied to clipboard

oo few IDs in DROP_GROUP mae, please ensure that it has at least 1 IDs, groups: []

Open SpaceCropTechnologies opened this issue 1 year ago • 18 comments

Good day,

Hello I am trying to use the Monoallelic expression, however, I am receiving the this error:

too few IDs in DROP_GROUP mae, please ensure that it has at least 1 IDs, groups: []

I am successful running the Splicing and AberrantExpression, I am only having problems with MAE.

This is my mae config: mae: run: true groups: - mae gatkIgnoreHeaderCheck: true padjCutoff: 0.05 allelicRatioCutoff: 0.8 addAF: true maxAF: 0.001 maxVarFreqCohort: 0.05 # VCF-BAM matching qcVcf: /work/users/pz192nijo/Projects/Archive/DROP3.DEMO/Data/qc_vcf_1000G.vcf.gz qcGroups: - mae dnaRnaMatchCutoff: 0.85

image The image shows my sample annotation.

P.S. running snakemake sampleAnnotation also gives error.

SpaceCropTechnologies avatar May 12 '23 13:05 SpaceCropTechnologies

it could be that there are no spacings before - mae in the groups parameter in the mae dictionary. It should be:

groups:
  - mae

vyepez88 avatar May 12 '23 13:05 vyepez88

Hello. It is still me, I just used the wrong account for posting but yes, that is my current problem. Given the spacings as mentioned still show the same problem.

lbundalian avatar May 12 '23 13:05 lbundalian

can you try:

groups: null

vyepez88 avatar May 13 '23 12:05 vyepez88

Hello. God day! It produces the same error with groups: null

lbundalian avatar May 14 '23 12:05 lbundalian

Can you double-check that all BAM files exist?

vyepez88 avatar May 14 '23 12:05 vyepez88

Ok Thanks I will

lbundalian avatar May 14 '23 12:05 lbundalian

It is working now, there is one I misspelled. However, I am getting this now:

[Sun May 14 15:12:35 2023] Error in rule mae_createSNVs: jobid: 52 input: /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/vcf/31800SL_S26-gatk-haplotype.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3032142_RAligned.sortedByCoord.out.bam, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/filterSNVs.sh, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/params/snvs/3032142_snvParams.csv output: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3032142--3032142.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3032142--3032142.vcf.gz.tbi shell:

    /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/filterSNVs.sh /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt /work/users/pz192nijo/Projects/Schubert.DROP3/vcf/31800SL_S26-gatk-haplotype.vcf.gz         3032142 /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3032142_RAligned.sortedByCoord.out.bam /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3032142--3032142.vcf.gz         bcftools samtools
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Sun May 14 15:12:35 2023] rule mae_createSNVs: input: /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt, /work/users/pz192nijo/Projects/Archive/DROP3.DEMO/Data/qc_vcf_1000G.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3029387_RAligned.sortedByCoord.out.bam, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/filterSNVs.sh, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/params/snvs/3029387_snvParams.csv output: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/QC--3029387.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/QC--3029387.vcf.gz.tbi jobid: 105 reason: Missing output files: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/QC--3029387.vcf.gz wildcards: vcf=QC, rna=3029387 resources: tmpdir=/tmp

[Sun May 14 15:12:47 2023] Finished job 111. 1 of 109 steps (1%) done [Sun May 14 15:12:47 2023] Finished job 69. 2 of 109 steps (2%) done [Sun May 14 15:12:47 2023] Finished job 84. 3 of 109 steps (3%) done [Sun May 14 15:12:47 2023] Finished job 105. 4 of 109 steps (4%) done [Sun May 14 15:12:48 2023] Finished job 96. 5 of 109 steps (5%) done Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-05-14T151231.826343.snakemake.log

Which just shows that I have an error but I cant pin point the exact error as it has no description (jobid: 52)

lbundalian avatar May 14 '23 13:05 lbundalian

there might be something wrong with this vcf file: 31800SL_S26-gatk-haplotype.vcf.gz make sure that the id inside the vcf file is the same one you specified in the DNA_ID column in the sample annotation

vyepez88 avatar May 15 '23 06:05 vyepez88

Hello. I have checked but I think it is still the same error:

Error in rule mae_createSNVs: jobid: 28 input: /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/vcf/3021163-gatk-haplotype.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3032087_RAligned.sortedByCoord.out.bam, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/filterSNVs.sh, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/params/snvs/3032087_snvParams.csv output: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3021163--3032087.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3021163--3032087.vcf.gz.tbi shell:

    /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/filterSNVs.sh /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt /work/users/pz192nijo/Projects/Schubert.DROP3/vcf/3021163-gatk-haplotype.vcf.gz         3021163 /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3032087_RAligned.sortedByCoord.out.bam /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3021163--3032087.vcf.gz         bcftools samtools
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Filter SNVs Failed to read from /work/users/pz192nijo/Projects/Schubert.DROP3/vcf/3022344-gatk-haplotype.vcf.gz: not compressed with bgzip Failed to read from standard input: unknown file type Failed to read from standard input: unknown file type Failed to read from standard input: unknown file type [Mon May 15 11:16:09 2023] Error in rule mae_createSNVs: jobid: 19 input: /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/vcf/3022344-gatk-haplotype.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3030025_RAligned.sortedByCoord.out.bam, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/filterSNVs.sh, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/params/snvs/3030025_snvParams.csv output: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3022344--3030025.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3022344--3030025.vcf.gz.tbi shell:

    /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/filterSNVs.sh /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt /work/users/pz192nijo/Projects/Schubert.DROP3/vcf/3022344-gatk-haplotype.vcf.gz         3022344 /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3030025_RAligned.sortedByCoord.out.bam /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/3022344--3030025.vcf.gz         bcftools samtools
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

lbundalian avatar May 15 '23 09:05 lbundalian

I had an error like this. There is a difference between gzipped and bgzipped files. The way I fixed this was uncompressing and recompressing in the requested format. Something like

gunzip file.vcf.gz
samtools bgzip file.vcf

https://github.com/samtools/bcftools/issues/668

mincej avatar May 15 '23 15:05 mincej

ok, I have seen another one but it didnt work, so I will give this one a try. Thank you

lbundalian avatar May 15 '23 15:05 lbundalian

image

Now the error is ike this upon updating the zip

lbundalian avatar May 17 '23 02:05 lbundalian

Hard to say, for some reason there's a formatting error in that vcf file. Did other samples run through?

vyepez88 avatar May 17 '23 05:05 vyepez88

None of them run. Now I am having problems related to this

ERROR: No allele-specific counts Make sure that the chromosome styles of the FASTA reference and BAM file match. If that isn't the issue, check that your VCF and BAM files are correctly formatted. If this problem persists and if this is your only sample causing issues, consider removing it from your analysis, as a last resort.

MAE ID: QC--3030732 VCF file: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/QC--3030732.vcf.gz BAM file: /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3030732_RAligned.sortedByCoord.out.bam FASTA file: /work/users/pz192nijo/Database/GenomeDB/GRCh38/GRCh38.primary_assembly.genome.fa Additionally the ReadGroups may be poorly formed. Please refer to https://gagneurlab-drop.readthedocs.io/en/latest/help.html for more information [Wed May 17 07:29:04 2023] Error in rule mae_allelicCounts: jobid: 26 input: /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt, /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/QC--3030732.vcf.gz, /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3030732_RAligned.sortedByCoord.out.bam, /work/users/pz192nijo/Database/GenomeDB/GRCh38/GRCh38.primary_assembly.genome.fa, /work/users/pz192nijo/Database/GenomeDB/GRCh38/GRCh38.primary_assembly.genome.dict, /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/ASEReadCounter.sh output: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/allelic_counts/QC--3030732.csv.gz shell:

    /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/MonoallelicExpression/pipeline/MAE/ASEReadCounter.sh /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_NCBI_UCSC.txt /work/users/pz192nijo/Projects/Schubert.DROP3/Scripts/Pipeline/chr_UCSC_NCBI.txt         /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/QC--3030732.vcf.gz /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3030732_RAligned.sortedByCoord.out.bam QC--3030732         /work/users/pz192nijo/Database/GenomeDB/GRCh38/GRCh38.primary_assembly.genome.fa True /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/allelic_counts/QC--3030732.csv.gz         bcftools samtools gatk
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job mae_allelicCounts since they might be corrupted: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/allelic_counts/QC--3030732.csv.gz

lbundalian avatar May 17 '23 05:05 lbundalian

can you check that the chr styles of these files match:

VCF file: /work/users/pz192nijo/Projects/Schubert.DROP3/Output/processed_data/mae/snvs/QC--3030732.vcf.gz
BAM file: /work/users/pz192nijo/Projects/Schubert.DROP3/bam/SS_3030732_RAligned.sortedByCoord.out.bam
FASTA file: /work/users/pz192nijo/Database/GenomeDB/GRCh38/GRCh38.primary_assembly.genome.fa

vyepez88 avatar May 17 '23 05:05 vyepez88

This is my error now: image

lbundalian avatar May 20 '23 05:05 lbundalian

Why would it be different if they have the same source of RNA_ID

lbundalian avatar May 20 '23 06:05 lbundalian

that problem arose due to either the RNA_IDs or the DNA_IDs being numeric. This is now fixed in the dev branch

vyepez88 avatar Jun 02 '23 17:06 vyepez88