ABC-Enhancer-Gene-Prediction icon indicating copy to clipboard operation
ABC-Enhancer-Gene-Prediction copied to clipboard

Index Error during Step 2

Open londonlondon20 opened this issue 4 years ago • 5 comments

Hello, I'd appreciate advice troubleshooting this index error. Also, is NPC a valid input for cell type? If not, what are the acceptable inputs? Lastly, is there a better way to contact you regarding issues with ABC rather than Github? If so, please let me know.

python3.6 src/run.neighborhoods.py --candidate_enhancer_regions /apps/a/output/D15_rep1_atac.macs2_peaks.narrowPeak.sorted.candidateRegions.bed --genes /apps/a/RefSeqUniqueGenesFinal5nodupes.bed
--H3K27ac /apps/a/d15_k27_1.2777_4.R1.fq.gz.sorted.bam
--DHS /apps/a/D15_rep1_atac.bam
--chrom_sizes /apps/a/chr
--cellType NPC
--outdir /apps/a/output-neighborhoods28/

Script output

Running: bedtools bamtobed -i /apps/a/d15_k27_1.2777_4.R1.fq.gz.sorted.bam | cut -f 1-3 | bedtools intersect -wa -a stdin -b /apps/a/chr.bed | bedtools sort -i stdin -faidx /apps/a/chr | bedtools coverage -g /apps/a/chr -counts -sorted -a /apps/a/output-neighborhoods28/GeneList.bed -b stdin | awk '{print $1 "\t" $2 "\t" $3 "\t" $NF}' > /apps/a/output-neighborhoods28/Genes.H3K27ac.d15_k27_1.2777_4.R1.fq.gz.sorted.bam.CountReads.bedgraph No columns to parse from file b'***** ERROR: illegal character '\r' found in integer conversion of string "248956422\r". Exiting...\n' BEDTools failed to count file: /apps/a/d15_k27_1.2777_4.R1.fq.gz.sorted.bam

". Exiting...teger conversion of string "248956422

Traceback (most recent call last): File "src/run.neighborhoods.py", line 97, in main(args) File "src/run.neighborhoods.py", line 93, in main processCellType(args) File "src/run.neighborhoods.py", line 74, in processCellType outdir = args.outdir) File "/apps/a/ABC-Enhancer-Gene-Prediction-0.2.2/src/neighborhoods.py", line 89, in annotate_genes_with_features genes = count_features_for_bed(genes, bounds_bed, genome_sizes, features, outdir, "Genes", force=force, use_fast_count=use_fast_count) File "/apps/a/ABC-Enhancer-Gene-Prediction-0.2.2/src/neighborhoods.py", line 415, in count_features_for_bed df = count_single_feature_for_bed(df, bed_file, genome_sizes, feature_bam, feature, directory, filebase, skip_rpkm_quantile, force, use_fast_count) File "/apps/a/ABC-Enhancer-Gene-Prediction-0.2.2/src/neighborhoods.py", line 435, in count_single_feature_for_bed domain_counts = read_bed(feature_outfile) File "/apps/a/ABC-Enhancer-Gene-Prediction-0.2.2/src/neighborhoods.py", line 483, in read_bed assert result.columns[0] == "chr" File "/home/a/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 4097, in getitem return getitem(key) IndexError: index 0 is out of bounds for axis 0 with size 0

londonlondon20 avatar Sep 19 '20 23:09 londonlondon20

I think your /apps/a/chr may be in DOS format (\r instead of \n for line endings), based on 248956422 being the length of chromosome 1.

thouis avatar Sep 23 '20 12:09 thouis

Thanks, but after fixing that, it yields a similar error:

Running: bedtools bamtobed -i /apps/j/D15_rep1_atac.bam | cut -f 1-3 | bedtools intersect -wa -a stdin -b /apps/j/chr.bed | bedtools sort -i stdin -faidx /apps/j/chr | bedtools coverage -g /apps/j/chr -counts -sorted -a /apps/j/output-neighborhoods29/GeneList.bed -b stdin | awk '{print $1 "\t" $2 "\t" $3 "\t" $NF}' > /apps/j/output-neighborhoods29/Genes.DHS.D15_rep1_atac.bam.CountReads.bedgraph No columns to parse from file b"terminate called after throwing an instance of 'std::bad_alloc'\n what(): std::bad_alloc\nError: Sorted input specified, but the file /apps/j/output-neighborhoods29/GeneList.bed has the following out of order record\nchr1\t179099329\t179229693\tABL2\t0\t-\n" BEDTools failed to count file: /apps/j/D15_rep1_atac.bam

terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Error: Sorted input specified, but the file /apps/j/output-neighborhoods29/GeneList.bed has the following out of order record chr1 179099329 179229693 ABL2 0 -

Traceback (most recent call last): File "src/run.neighborhoods.py", line 97, in main(args) File "src/run.neighborhoods.py", line 93, in main processCellType(args) File "src/run.neighborhoods.py", line 74, in processCellType outdir = args.outdir) File "/apps/j/ABC-Enhancer-Gene-Prediction-0.2.2/src/neighborhoods.py", line 89, in annotate_genes_with_features genes = count_features_for_bed(genes, bounds_bed, genome_sizes, features, outdir, "Genes", force=force, use_fast_count=use_fast_count) File "/apps/j/ABC-Enhancer-Gene-Prediction-0.2.2/src/neighborhoods.py", line 415, in count_features_for_bed df = count_single_feature_for_bed(df, bed_file, genome_sizes, feature_bam, feature, directory, filebase, skip_rpkm_quantile, force, use_fast_count) File "/apps/j/ABC-Enhancer-Gene-Prediction-0.2.2/src/neighborhoods.py", line 435, in count_single_feature_for_bed domain_counts = read_bed(feature_outfile) File "/apps/j/ABC-Enhancer-Gene-Prediction-0.2.2/src/neighborhoods.py", line 483, in read_bed assert result.columns[0] == "chr" File "/home/j/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 4097, in getitem return getitem(key) IndexError: index 0 is out of bounds for axis 0 with size 0

londonlondon20 avatar Sep 24 '20 03:09 londonlondon20

Could you try these two things (together):

  • remove any files in /apps/j/output-neighborhoods29/
  • check out the use_pysam branch of the repository to see if it fixes the memory error.

thouis avatar Sep 24 '20 11:09 thouis

Okay. used the use_pysam branch. but now I get a different error as below

python3.6 src/run.neighborhoods.py --candidate_enhancer_regions /apps/user/output/D15_rep1_atac.macs2_peaks.narrowPeak.sorted.candidateRegions.bed --genes /apps/user/RefSeqUniqueGenesFinal5nodupes.bed --H3K27ac /apps/user/d15_k27_1.2777_4.R1.fq.gz.sorted.bam --DHS /apps/user/D15_rep1_atac.bam --chrom_sizes /apps/user/chr --cellType NPC --outdir /apps/user/output-neighborhoods210/ Namespace(ATAC='', DHS='/apps/user/D15_rep1_atac.bam', H3K27ac='/apps/user/d15_k27_1.2777_4.R1.fq.gz.sorted.bam', candidate_enhancer_regions='/apps/user/output/D15_rep1_atac.macs2_peaks.narrowPeak.sorted.candidateRegions.bed', cellType='NPC', chrom_sizes='/apps/user/chr', default_accessibility_feature=None, enhancer_class_override=None, expression_table='', gene_name_annotations='symbol', genes='/apps/user/RefSeqUniqueGenesFinal5nodupes.bed', genes_for_class_assignment=None, outdir='/apps/user/output-neighborhoods210/', primary_gene_identifier='symbol', qnorm=None, skip_gene_counts=False, skip_rpkm_quantile=False, supplementary_features=None, tss_slop_for_class_assignment=500, ubiquitously_expressed_genes=None, use_secondary_counting_method=False) Running command: bedtools sort -faidx /apps/user/chr -i /apps/user/output-neighborhoods210/GeneList.TSS1kb.bed > /apps/user/output-neighborhoods210/GeneList.TSS1kb.bed.sorted; mv /apps/user/output-neighborhoods210/GeneList.TSS1kb.bed.sorted /apps/user/output-neighborhoods210/GeneList.TSS1kb.bed Regenerating /apps/user/output-neighborhoods210/Genes.H3K27ac.d15_k27_1.2777_4.R1.fq.gz.sorted.bam.CountReads.bedgraph Counting coverage for Genes.H3K27ac.d15_k27_1.2777_4.R1.fq.gz.sorted.bam [E::idx_find_and_load] Could not retrieve index file for '/apps/user/d15_k27_1.2777_4.R1.fq.gz.sorted.bam' Traceback (most recent call last): File "src/run.neighborhoods.py", line 97, in main(args) File "src/run.neighborhoods.py", line 93, in main processCellType(args) File "src/run.neighborhoods.py", line 74, in processCellType outdir = args.outdir) File "/apps/user/ABC-Enhancer-Gene-Prediction-use_pysam/src/neighborhoods.py", line 90, in annotate_genes_with_features genes = count_features_for_bed(genes, bounds_bed, genome_sizes, features, outdir, "Genes", force=force, use_fast_count=use_fast_count) File "/apps/user/ABC-Enhancer-Gene-Prediction-use_pysam/src/neighborhoods.py", line 361, in count_features_for_bed df = count_single_feature_for_bed(df, bed_file, genome_sizes, feature_bam, feature, directory, filebase, skip_rpkm_quantile, force, use_fast_count) File "/apps/user/ABC-Enhancer-Gene-Prediction-use_pysam/src/neighborhoods.py", line 377, in count_single_feature_for_bed run_count_reads(feature_bam, feature_outfile, bed_file, genome_sizes, use_fast_count) File "/apps/user/ABC-Enhancer-Gene-Prediction-use_pysam/src/neighborhoods.py", line 303, in run_count_reads count_bam(target, bed_file, output, genome_sizes=genome_sizes, use_fast_count=use_fast_count) File "/apps/user/ABC-Enhancer-Gene-Prediction-use_pysam/src/neighborhoods.py", line 318, in count_bam counts = [(reads.count(row.chr, row.start, row.end) if (row.chr in read_chrs) else 0) for _, row in bed_regions.iterrows()] File "/apps/user/ABC-Enhancer-Gene-Prediction-use_pysam/src/neighborhoods.py", line 318, in counts = [(reads.count(row.chr, row.start, row.end) if (row.chr in read_chrs) else 0) for _, row in bed_regions.iterrows()] File "pysam/libcalignmentfile.pyx", line 1425, in pysam.libcalignmentfile.AlignmentFile.count File "pysam/libcalignmentfile.pyx", line 1093, in pysam.libcalignmentfile.AlignmentFile.fetch ValueError: fetch called on bamfile without index

londonlondon20 avatar Sep 27 '20 01:09 londonlondon20

Sorry, should have been clearer. You'll need to sort each bam by position and run samtools index on it.

On Sat, Sep 26, 2020, 21:46 londonlondon20 [email protected] wrote:

Okay. used the use_pysam branch. but now I get a different error as below

python3.6 src/run.neighborhoods.py --candidate_enhancer_regions /apps/user/output/D15_rep1_atac.macs2_peaks.narrowPeak.sorted.candidateRegions.bed --genes /apps/user/RefSeqUniqueGenesFinal5nodupes.bed --H3K27ac /apps/user/d15_k27_1.2777_4.R1.fq.gz.sorted.bam --DHS /apps/user/D15_rep1_atac.bam --chrom_sizes /apps/user/chr --cellType NPC --outdir /apps/user/output-neighborhoods210/ Namespace(ATAC='', DHS='/apps/user/D15_rep1_atac.bam', H3K27ac='/apps/user/d15_k27_1.2777_4.R1.fq.gz.sorted.bam', candidate_enhancer_regions='/apps/user/output/D15_rep1_atac.macs2_peaks.narrowPeak.sorted.candidateRegions.bed', cellType='NPC', chrom_sizes='/apps/user/chr', default_accessibility_feature=None, enhancer_class_override=None, expression_table='', gene_name_annotations='symbol', genes='/apps/user/RefSeqUniqueGenesFinal5nodupes.bed', genes_for_class_assignment=None, outdir='/apps/user/output-neighborhoods210/', primary_gene_identifier='symbol', qnorm=None, skip_gene_counts=False, skip_rpkm_quantile=False, supplementary_features=None, tss_slop_for_class_assignment=500, ubiquitously_expressed_genes=None, use_secondary_counting_method=False) Running command: bedtools sort -faidx /apps/user/chr -i /apps/user/output-neighborhoods210/GeneList.TSS1kb.bed > /apps/user/output-neighborhoods210/GeneList.TSS1kb.bed.sorted; mv /apps/user/output-neighborhoods210/GeneList.TSS1kb.bed.sorted /apps/user/output-neighborhoods210/GeneList.TSS1kb.bed Regenerating /apps/user/output-neighborhoods210/Genes.H3K27ac.d15_k27_1.2777_4.R1.fq.gz.sorted.bam.CountReads.bedgraph Counting coverage for Genes.H3K27ac.d15_k27_1.2777_4.R1.fq.gz.sorted.bam [E::idx_find_and_load] Could not retrieve index file for '/apps/user/d15_k27_1.2777_4.R1.fq.gz.sorted.bam' Traceback (most recent call last): File "src/run.neighborhoods.py", line 97, in main(args) File "src/run.neighborhoods.py", line 93, in main processCellType(args) File "src/run.neighborhoods.py", line 74, in processCellType outdir = args.outdir) File "/apps/user/ABC-Enhancer-Gene-Prediction-use_pysam/src/neighborhoods.py", line 90, in annotate_genes_with_features genes = count_features_for_bed(genes, bounds_bed, genome_sizes, features, outdir, "Genes", force=force, use_fast_count=use_fast_count) File "/apps/user/ABC-Enhancer-Gene-Prediction-use_pysam/src/neighborhoods.py", line 361, in count_features_for_bed df = count_single_feature_for_bed(df, bed_file, genome_sizes, feature_bam, feature, directory, filebase, skip_rpkm_quantile, force, use_fast_count) File "/apps/user/ABC-Enhancer-Gene-Prediction-use_pysam/src/neighborhoods.py", line 377, in count_single_feature_for_bed run_count_reads(feature_bam, feature_outfile, bed_file, genome_sizes, use_fast_count) File "/apps/user/ABC-Enhancer-Gene-Prediction-use_pysam/src/neighborhoods.py", line 303, in run_count_reads count_bam(target, bed_file, output, genome_sizes=genome_sizes, use_fast_count=use_fast_count) File "/apps/user/ABC-Enhancer-Gene-Prediction-use_pysam/src/neighborhoods.py", line 318, in count_bam counts = [(reads.count(row.chr, row.start, row.end) if (row.chr in read_chrs) else 0) for _, row in bed_regions.iterrows()] File "/apps/user/ABC-Enhancer-Gene-Prediction-use_pysam/src/neighborhoods.py", line 318, in counts = [(reads.count(row.chr, row.start, row.end) if (row.chr in read_chrs) else 0) for _, row in bed_regions.iterrows()] File "pysam/libcalignmentfile.pyx", line 1425, in pysam.libcalignmentfile.AlignmentFile.count File "pysam/libcalignmentfile.pyx", line 1093, in pysam.libcalignmentfile.AlignmentFile.fetch ValueError: fetch called on bamfile without index

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction/issues/36#issuecomment-699571678, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADTPU3GG76RIID6MOOC3BLSH2KQTANCNFSM4RTJQ5NQ .

thouis avatar Sep 27 '20 02:09 thouis

We've revamped the codebase. Please check out https://github.com/broadinstitute/ABC-Enhancer-Gene-Prediction/tree/main and reopen your issue if it still exists

atancoder avatar Dec 07 '23 23:12 atancoder