mudskipper icon indicating copy to clipboard operation
mudskipper copied to clipboard

'main' panicked at 'index out of bounds: the len is 1 but the index is 1'

Open aleighbrown opened this issue 3 years ago • 22 comments

Built the index first using gencode v40

mudskipper index --gtf /SAN/vyplab/vyplab_reference_genomes/annotation/human/GRCh38/gencode.v40.annotation.gtf --dir-index /SAN/vyplab/vyplab_reference_genomes/mudskipper_indices/human.gencode.v40

And then tried to convert with

mudskipper bulk --index /SAN/vyplab/vyplab_reference_genomes/mudskipper_indices/human.gencode.v40/ --alignment my.sorted.bam --out my.mud.bam --threads 4

However this freaks out immediately with:


[2022-05-02T12:43:07Z INFO  mudskipper] Mudskipper started...
[2022-05-02T12:43:07Z INFO  mudskipper::annotation] Loading parsed GTF...
[2022-05-02T12:43:08Z INFO  mudskipper::bam] Number of reference sequences: 246624
[2022-05-02T12:43:09Z INFO  mudskipper::bam] thread count: 4 in total
[2022-05-02T12:43:09Z INFO  mudskipper::bam] thread count: 2 for reading
[2022-05-02T12:43:09Z INFO  mudskipper::bam] thread count: 2 for writing
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/query_bam_records.rs:165:33
stack backtrace:
   0:     0x5599e7546c2d - std::backtrace_rs::backtrace::libunwind::trace::hee598835bc88d35b
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x5599e7546c2d - std::backtrace_rs::backtrace::trace_unsynchronized::h9cdc730ba5cf5d72
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x5599e7546c2d - std::sys_common::backtrace::_print_fmt::h75aeaf7ed30e43fa
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:66:5
   3:     0x5599e7546c2d - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h606862f787600875
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:45:22
   4:     0x5599e7569a9c - core::fmt::write::he803f0f418caf762
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/fmt/mod.rs:1190:17
   5:     0x5599e7542fc8 - std::io::Write::write_fmt::h70bc45872f37e7bb
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/io/mod.rs:1657:15
   6:     0x5599e7548cd7 - std::sys_common::backtrace::_print::h64d038cf8ac3e13e
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:48:5
   7:     0x5599e7548cd7 - std::sys_common::backtrace::print::h359300b4a7fccf65
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:35:9
   8:     0x5599e7548cd7 - std::panicking::default_hook::{{closure}}::hf51be35e2f510149
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:295:22
   9:     0x5599e75489a0 - std::panicking::default_hook::h03ca0f22e1d2d25e
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:314:9
  10:     0x5599e7549429 - std::panicking::rust_panic_with_hook::h3b7380e99b825b63
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:698:17
  11:     0x5599e7549117 - std::panicking::begin_panic_handler::{{closure}}::h8e849d0710154ce0
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:588:13
  12:     0x5599e75470f4 - std::sys_common::backtrace::__rust_end_short_backtrace::hedcdaddbd4c46cc5
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:138:18
  13:     0x5599e7548e29 - rust_begin_unwind
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:584:5
  14:     0x5599e737dcb3 - core::panicking::panic_fmt::he1bbc7336d49a357
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:143:14
  15:     0x5599e737dbf2 - core::panicking::panic_bounds_check::h7777e18766a27b0b
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:85:5
  16:     0x5599e73a4d1f - mudskipper::query_bam_records::BAMQueryRecordReader::group_records::h3e557bbfbf98c5b3
  17:     0x5599e73a55cd - mudskipper::query_bam_records::BAMQueryRecordReader::get_next_query_records::hd44f4b13b6418ce6
  18:     0x5599e73946dd - mudskipper::bam::bam2bam::h107271027deeb930
  19:     0x5599e739069c - mudskipper::main::hd9af9ad03c0768b5
  20:     0x5599e7393d23 - std::sys_common::backtrace::__rust_begin_short_backtrace::h6d04830762f92fce
  21:     0x5599e73a6249 - std::rt::lang_start::{{closure}}::h3a11e2c99d3705be
  22:     0x5599e7546311 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::hb7014f43484a8b4e
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ops/function.rs:259:13
  23:     0x5599e7546311 - std::panicking::try::do_call::h7bc9dc436daeb8c7
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:492:40
  24:     0x5599e7546311 - std::panicking::try::h653d68a27ff5f175
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:456:19
  25:     0x5599e7546311 - std::panic::catch_unwind::h9d739f9f59895e68
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panic.rs:137:14
  26:     0x5599e7546311 - std::rt::lang_start_internal::{{closure}}::hf006f2bc7ce22bbe
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/rt.rs:128:48
  27:     0x5599e7546311 - std::panicking::try::do_call::hfb39d6df61a2e69f
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:492:40
  28:     0x5599e7546311 - std::panicking::try::h13e2d225134958ac
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:456:19
  29:     0x5599e7546311 - std::panic::catch_unwind::h3bd49b5a5dfb1a50
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panic.rs:137:14
  30:     0x5599e7546311 - std::rt::lang_start_internal::h2ba92edce36c035e
                               at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/rt.rs:128:20
  31:     0x5599e7391632 - main
  32:     0x2b9c394ed555 - __libc_start_main
  33:     0x5599e737e4e9 - <unknown>
  34:                0x0 - <unknown>

I get the same error on a few other BAMs I've tried as well

And the same error persist if I do this

mudskipper bulk --gtf /SAN/vyplab/vyplab_reference_genomes/annotation/human/GRCh38/gencode.v40.annotation.gtf --alignment my.sorted.bam --out my.mud.bam --threads 4

aleighbrown avatar May 02 '22 12:05 aleighbrown

Hi @aleighbrown,

Thank you for reporting this issue! Would you be able to share the BAM file that is causing this issue, or at least a small subset of it? This will help greatly in debugging the problem.

Thanks, Rob

rob-p avatar May 05 '22 14:05 rob-p

I am also getting this error - see details below. Hope this helps with debugging.

rust setup

rustup show

Default host: x86_64-unknown-linux-gnu
rustup home:  /nfs/scistore16/itgrp/jelbers/.rustup

installed toolchains
--------------------

stable-x86_64-unknown-linux-gnu (default)
nightly-x86_64-unknown-linux-gnu

active toolchain
----------------

stable-x86_64-unknown-linux-gnu (default)
rustc 1.60.0 (7737e0b5c 2022-04-04)

mudskipper installation

cd ~/git/
git clone https://github.com/OceanGenomics/mudskipper
cd mudskipper
git show

commit f6382774aa84332719cd545cffd776604c2bc79a (HEAD -> main, origin/main, origin/HEAD)

cargo build --release

get GTF file and genome

wget ftp://ftp.ensemblgenomes.org/pub/metazoa/release-46/gtf/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.46.gtf.gz
gunzip Caenorhabditis_elegans.WBcel235.46.gtf.gz
wget ftp://ftp.ensemblgenomes.org/pub/metazoa/release-46/fasta/caenorhabditis_elegans/dna/Caenorhabditis_elegans.WBcel235.dna.toplevel.fa.gz
gunzip Caenorhabditis_elegans.WBcel235.dna.toplevel.fa.gz

STAR index genome

module load seqtk/20210125
# make sequences upper case (remove soft-masking)
seqtk seq -CU Caenorhabditis_elegans.WBcel235.dna.toplevel.fa > Caenorhabditis_elegans.WBcel235.dna.toplevel_uppercase.fa

module load STAR/2.7.9a
mkdir -p Caenorhabditis_elegans.WBcel235.dna.toplevel_Ensembl46_STAR-2.7.9a_sjdbOverhang18
STAR \
--genomeSAindexNbases 12 \
--limitGenomeGenerateRAM 100000000000 \
--runThreadN 34 \
--sjdbOverhang 18 \
--runMode genomeGenerate \
--genomeDir Caenorhabditis_elegans.WBcel235.dna.toplevel_Ensembl46_STAR-2.7.9a_sjdbOverhang18 \
--genomeFastaFiles Caenorhabditis_elegans.WBcel235.dna.toplevel_uppercase.fa \
--sjdbGTFfile Caenorhabditis_elegans.WBcel235.46.gtf

STAR mapping

module load STAR/2.7.9a
STAR  --runThreadN 32 \
--genomeDir auxData/Caenorhabditis_elegans.WBcel235.dna.toplevel_Ensembl46_STAR-2.7.9a_sjdbOverhang18 \
--readFilesCommand zcat \
--outSAMtype BAM SortedByCoordinate \
--outFileNamePrefix mapped/ref/RNAseq.MUT1.rep4 \
--outTmpDir TMP/ref_RNAseq.MUT1.rep4/TEMP/ \
--outReadsUnmapped Fastx \
--readFilesIn ../../analysis.1/quantification/fastq/RNAseq.MUT1.rep4-trimmed.mate1.fastq.gz \
../../analysis.1/quantification/fastq/RNAseq.MUT1.rep4-trimmed.mate2.fastq.gz

Get only first two-alignments and header

samtools view -h mapped/ref/RNAseq.MUT1.rep4Aligned.sortedByCoord.out.bam|head -n 13|samtools view -Sb > test.bam

mudskipper index GTF

export PATH="/nfs/scistore16/itgrp/jelbers/git/mudskipper/target/release:$PATH"
mudskipper index --gtf auxData/Caenorhabditis_elegans.WBcel235.46.gtf --dir-index auxData/mudskipper

[2022-05-12T06:02:52Z INFO  mudskipper] Mudskipper started...
[2022-05-12T06:02:52Z INFO  mudskipper::annotation] reading the gtf file and building the tree.
[2022-05-12T06:02:59Z INFO  mudskipper::annotation] building the tree
[2022-05-12T06:02:59Z INFO  mudskipper::annotation] Time to build the tree: 7.234329464s
[2022-05-12T06:02:59Z INFO  mudskipper::annotation] saving the GTF index
[2022-05-12T06:03:03Z INFO  mudskipper::annotation] Done with saving the GTF index
[2022-05-12T06:03:03Z INFO  mudskipper] Mudskipper finished.

mudskipper bulk

mudskipper bulk --threads 32 --alignment test.bam --index auxData/mudskipper --out alignments.bam

[2022-05-12T06:14:27Z INFO  mudskipper] Mudskipper started...
[2022-05-12T06:14:27Z INFO  mudskipper::annotation] Loading parsed GTF...
[2022-05-12T06:14:28Z INFO  mudskipper::bam] Number of reference sequences: 61451
[2022-05-12T06:14:28Z INFO  mudskipper::bam] thread count: 32 in total
[2022-05-12T06:14:28Z INFO  mudskipper::bam] thread count: 16 for reading
[2022-05-12T06:14:28Z INFO  mudskipper::bam] thread count: 16 for writing
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/query_bam_records.rs:165:33
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

zip up files for help

zip mudskipper-help.zip test.bam auxData/mudskipper/ -r

download zip file with wget

wget https://www.dropbox.com/s/5f9vrzs2yxx8yvn/mudskipper-help.zip

jelber2 avatar May 12 '22 06:05 jelber2

Awesome! Thanks for the reproducibility data. We'll take a look at what could be going on here. Cc @HosseinAsghari @gmarcais.

rob-p avatar May 13 '22 01:05 rob-p

I'm seeing the same thing, but with index out of bounds: the len is 3 but the index is 3', src/query_bam_records.rs:165:33

IanSudbery avatar Jun 01 '22 14:06 IanSudbery

Hi, I've encountered same issue. hread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/query_bam_records.rs:165:33

Used version is mudskipper 0.1.0

SchwarzMarek avatar Jun 07 '22 07:06 SchwarzMarek

Same error here. Using mudskipper commit 24f1597. Paired-end 150 bp. Log with export RUST_BACKTRACE=full):

bin/mudskipper bulk         --gtf transcripts-polya.gtf         --alignment test.bam --out test.toTranscriptome.bam         --max-softclip 50         --threads 12 --max_mem_mb 30000                                                                                                                                                                                                                                                                                           
[2022-06-10T21:14:44Z INFO  mudskipper] Mudskipper started...                                                                                                                                                                                                                                          
[2022-06-10T21:14:44Z INFO  mudskipper::annotation] Loading parsed GTF...                                                                                                                                                                                                                              
[2022-06-10T21:14:44Z INFO  mudskipper::bam] Number of reference sequences: 6016                                                                                                                                                                                                                       [2022-06-10T21:14:44Z INFO  mudskipper::bam] thread count: 12 in total                                                                                                                                                                                                                                 
[2022-06-10T21:14:44Z INFO  mudskipper::bam] thread count: 6 for reading                                                                                                                                                                                                                               [2022-06-10T21:14:44Z INFO  mudskipper::bam] thread count: 6 for writing                                                                                                                                                                                                                               
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/query_bam_records.rs:165:33                                                                                                                                                                                      stack backtrace:                                                                                                                                                                                                                                                                                       
   0:     0x55bedc47a3ad - std::backtrace_rs::backtrace::libunwind::trace::h22893a5306c091b4                                                                                                                                                                                                                                          at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5                                                                                                                                                      
   1:     0x55bedc47a3ad - std::backtrace_rs::backtrace::trace_unsynchronized::h29c3bc6f9e91819d                                                                                                                                                                                                                                      at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5                                                                                                                                                            
   2:     0x55bedc47a3ad - std::sys_common::backtrace::_print_fmt::he497d8a0ec903793                                                                                                                                                                                                                                                  at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys_common/backtrace.rs:66:5                                                                                                                                                                         
   3:     0x55bedc47a3ad - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h9c2a9d2774d81873                                                                                                                                                                                                       at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys_common/backtrace.rs:45:22                                                                                                                                                                        
   4:     0x55bedc49c5ac - core::fmt::write::hba4337c43d992f49                                                                                                                                                                                                                                                                        at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/fmt/mod.rs:1194:17                                                                                                                                                                                     5:     0x55bedc477661 - std::io::Write::write_fmt::heb73de6e02cfabed                                                                                                                                                                                                                                
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/io/mod.rs:1655:15                                                                                                                                                                                       6:     0x55bedc47be95 - std::sys_common::backtrace::_print::h63c8b24acdd8e8ce                                                                                                                                                                                                                       
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys_common/backtrace.rs:48:5                                                                                                                                                                            7:     0x55bedc47be95 - std::sys_common::backtrace::print::h426700d6240cdcc2                                                                                                                                                                                                                        
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys_common/backtrace.rs:35:9                                                                                                                                                                            8:     0x55bedc47be95 - std::panicking::default_hook::{{closure}}::hc9a76eed0b18f82b                                                                                                                                                                                                                
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:295:22                                                                                                                                                                                     9:     0x55bedc47bb49 - std::panicking::default_hook::h2e88d02087fae196                                                                                                                                                                                                                             
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:314:9                                                                                                                                                                                     10:     0x55bedc47c3e2 - std::panicking::rust_panic_with_hook::habfdcc2e90f9fd4c                                                                                                                                                                                                                     
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:698:17                                                                                                                                                                                    11:     0x55bedc47c2c7 - std::panicking::begin_panic_handler::{{closure}}::he054b2a83a51d2cd                                                                                                                                                                                                         
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:588:13                                                                                                                                                                                    12:     0x55bedc47a864 - std::sys_common::backtrace::__rust_end_short_backtrace::ha48b94ab49b30915                                                                                                                                                                                                   
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys_common/backtrace.rs:138:18                                                                                                                                                                         13:     0x55bedc47bff9 - rust_begin_unwind                                                                                                                                                                                                                                                           
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:584:5                                                                                                                                                                                     14:     0x55bedc26b333 - core::panicking::panic_fmt::h366d3a309ae17c94                                                                                                                                                                                                                               
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:143:14                                                                                                                                                                                   15:     0x55bedc26b272 - core::panicking::panic_bounds_check::he579b0a9d71a3102                                                                                                                                                                                                                      
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:84:5                                                                                                                                                                                     16:     0x55bedc2a8188 - mudskipper::query_bam_records::BAMQueryRecordReader::group_records::h8962a44cdbf203a1                                                                                                                                                                                       
  17:     0x55bedc2a885c - mudskipper::query_bam_records::BAMQueryRecordReader::get_next_query_records::h87197d7e3d4e4f48                                                                                                                                                                                18:     0x55bedc2a494e - mudskipper::bam::bam2bam::hd8140d27dc98da38                                                                                                                                                                                                                                 
  19:     0x55bedc283b4b - mudskipper::main::h7c9075d03ee11060                                                                                                                                                                                                                                           20:     0x55bedc2a3063 - std::sys_common::backtrace::__rust_begin_short_backtrace::h0397ee588593813a                                                                                                                                                                                                 
  21:     0x55bedc2a34d9 - std::rt::lang_start::{{closure}}::hae4197f529e896bb                                                                                                                                                                                                                         
  22:     0x55bedc4721be - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::had4f69b3aefb47a8                                                                                                                                                                      
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:259:13                                                                                                                                                                              
  23:     0x55bedc4721be - std::panicking::try::do_call::hf2ad5355fcafe775                                                                                                                                                                                                                             
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:492:40                                                                                                                                                                                  
  24:     0x55bedc4721be - std::panicking::try::h0a63ac363423e61e                                                                                                                                                                                                                                      
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:456:19                                                                                                                                                                                  
  25:     0x55bedc4721be - std::panic::catch_unwind::h18088edcecb8693a                                                                                                                                                                                                                                 
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panic.rs:137:14                                                                                                                                                                                      
  26:     0x55bedc4721be - std::rt::lang_start_internal::{{closure}}::ha7dad166dc711761                                                                                                                                                                                                                
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/rt.rs:128:48                                                                                                                                                                                         
  27:     0x55bedc4721be - std::panicking::try::do_call::hda0c61bf3a57d6e6                                                                                                                                                                                                                             
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:492:40                                                                                                                                                                                  
  28:     0x55bedc4721be - std::panicking::try::hbc940e68560040a9                                                                                                                                                                                                                                      
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:456:19                                                                                                                                                                                  
  29:     0x55bedc4721be - std::panic::catch_unwind::haed0df2aeb3fa368                                                                                                                                                                                                                                 
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panic.rs:137:14                                                                                                                                                                                      
  30:     0x55bedc4721be - std::rt::lang_start_internal::h9c06694362b5b80c                                                                                                                                                                                                                             
                               at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/rt.rs:128:20                                                                                                                                                                                         
  31:     0x55bedc2850d2 - main                                                                                                                                                                                                                                                                        
  32:     0x7f20915b9f4a - __libc_start_main                                                                                                                                                                                                                                                           
  33:     0x55bedc26b5da - _start                                                                                                                                                                                                                                                                      
                               at /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120                                                                                                                                                                                             
  34:                0x0 - <unknown> 

Test data here: https://www.dropbox.com/s/st1qd1jlx9r8kbf/mudskipper-issue24.tar.gz?dl=0

Pull request #26 seems to help. EDIT: #26 doesn't completely fix this issue.

janoppelt avatar Jun 10 '22 21:06 janoppelt

I have the same error:

`➜ ~ mudskipper bulk --index mudskipper_gencode_v37 --alignment 66K00047.mapped.bam --out 66K00047.transc.bam

[2022-06-13T01:52:08Z INFO mudskipper] Mudskipper started... [2022-06-13T01:52:08Z INFO mudskipper::annotation] Loading parsed GTF... [2022-06-13T01:52:09Z INFO mudskipper::bam] Number of reference sequences: 234552 [2022-06-13T01:52:10Z INFO mudskipper::bam] thread count: 1 thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/query_bam_records.rs:165:33 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace`

VitorAguiar avatar Jun 13 '22 01:06 VitorAguiar

It seems the error happens when a paired-end SAM/BAM is not name sorted and/or at least one of the paired reads is missing one of the mates. The mate doesn't have to be aligned but has to be present in the SAM/BAM file.

janoppelt avatar Jun 15 '22 19:06 janoppelt

is perhaps the problem if there is not an even number of alignments per pair? I notice that you are getting 'len 3' rather than 'len 1'. This could be a problem if there are multimappers.

On Thu, 16 Jun 2022, 03:44 Vitor, @.***> wrote:

@opplatek https://github.com/opplatek Unfortunately I can't confirm that. I mapped reads with STAR keeping unmapped reads, so I have all the reads in the BAM file. I sorted the reads by name with samtools sort -n. I still have the same error. If I extract only reads mapped in proper pairs with samtools view -f 0x2, I still get the error.

Any suggestions?

➜ simulation git:(master) ✗ mudskipper bulk --threads 4 --index mudskipper_gencode_v37 --alignment 66K00047.sortedByName.bam --out bam_transc/66K00047.bam

[2022-06-16T02:29:41Z INFO mudskipper] Mudskipper started...

[2022-06-16T02:29:41Z INFO mudskipper::annotation] Loading parsed GTF...

[2022-06-16T02:29:43Z INFO mudskipper::bam] Number of reference sequences: 234552

[2022-06-16T02:29:44Z INFO mudskipper::bam] thread count: 4 in total

[2022-06-16T02:29:44Z INFO mudskipper::bam] thread count: 2 for reading

[2022-06-16T02:29:44Z INFO mudskipper::bam] thread count: 2 for writing

thread 'main' panicked at 'index out of bounds: the len is 3 but the index is 3', src/query_bam_records.rs:165:33

note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

— Reply to this email directly, view it on GitHub https://github.com/OceanGenomics/mudskipper/issues/24#issuecomment-1157170793, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJELDXILVEN3RFKLS7I2V3VPKIKJANCNFSM5U34LXAQ . You are receiving this because you commented.Message ID: @.***>

IanSudbery avatar Jun 16 '22 07:06 IanSudbery

Hi @opplatek,

So taking a look at the test data (thanks for providing a reproducible example, btw!) I concur that what you are suggesting will be a problem here. For example, in the provided test data, the input records are not name sorted, and so, e.g. for read A00151:313:HWM35DSXY:4:2327:22652:26412 we see one of the alignments, followed by records for another read, and then the alignment for the mate.

We recently merged in a shuffle feature that should work to collate non name-sorted BAM files (i.e. position sorted BAM files) and also randomize the order. We can try to test that out to see if it produces a properly parsable input out of the BAM you provided.

However, I think the second condition you raise is a more fundamental one. That is, it is currently expected that if the read is marked as paired in sequencing, then it is expected that a record exists in the BAM file for the mate (even if unmapped). We should decide if this is a reasonable restriction, and if not, how best to lift it without making the parsing logic too complex.

Thanks, Rob

rob-p avatar Jun 16 '22 15:06 rob-p

is perhaps the problem if there is not an even number of alignments per pair? I notice that you are getting 'len 3' rather than 'len 1'. This could be a problem if there are multimappers. On Thu, 16 Jun 2022, 03:44 Vitor, @.> wrote: @opplatek https://github.com/opplatek Unfortunately I can't confirm that. I mapped reads with STAR keeping unmapped reads, so I have all the reads in the BAM file. I sorted the reads by name with samtools sort -n. I still have the same error. If I extract only reads mapped in proper pairs with samtools view -f 0x2, I still get the error. Any suggestions? ➜ simulation git:(master) ✗ mudskipper bulk --threads 4 --index mudskipper_gencode_v37 --alignment 66K00047.sortedByName.bam --out bam_transc/66K00047.bam [2022-06-16T02:29:41Z INFO mudskipper] Mudskipper started... [2022-06-16T02:29:41Z INFO mudskipper::annotation] Loading parsed GTF... [2022-06-16T02:29:43Z INFO mudskipper::bam] Number of reference sequences: 234552 [2022-06-16T02:29:44Z INFO mudskipper::bam] thread count: 4 in total [2022-06-16T02:29:44Z INFO mudskipper::bam] thread count: 2 for reading [2022-06-16T02:29:44Z INFO mudskipper::bam] thread count: 2 for writing thread 'main' panicked at 'index out of bounds: the len is 3 but the index is 3', src/query_bam_records.rs:165:33 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace — Reply to this email directly, view it on GitHub <#24 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJELDXILVEN3RFKLS7I2V3VPKIKJANCNFSM5U34LXAQ . You are receiving this because you commented.Message ID: @.>

You are absolutely right. It seems the error generally applies to all paired mappings that have anything else but exactly two SAM/BAM lines per pair and are name sorted.

My suggestion would be to raise a warning that not all mates are present and/or that one of the mates is multimapped and then skip the pair. I don't think mudskipper should be making decisions on which mapping to keep.

janoppelt avatar Jun 17 '22 15:06 janoppelt

I appreciate that the update by @yfei-w now prints out the ID for the first read found without a mate in the error message. It would be nice if such alignments were just skipped, with a warning message.

VitorAguiar avatar Jun 30 '22 11:06 VitorAguiar

@VitorAguiar , as @gmarcais wrote here - https://github.com/OceanGenomics/mudskipper/pull/29#issuecomment-1170069840 , my current pull-request fails silently by treating the first read without a mate as a single-end read.

jelber2 avatar Jun 30 '22 12:06 jelber2

@VitorAguiar , as @gmarcais wrote here - #29 (comment) , my current pull-request fails silently by treating the first read without a mate as a single-end read.

Thanks @jelber2, I tested it out and it works for me!

VitorAguiar avatar Jul 01 '22 19:07 VitorAguiar

@VitorAguiar , ok, now my pull request produces a warning, that there is a missing mate for some reason

jelber2 avatar Jul 04 '22 09:07 jelber2

I tried sorting my STAR output files by name and running mudskipper but I still get this error. thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/query_bam_records.rs:165:33

Delta-43 avatar Jul 11 '22 12:07 Delta-43

Did you try one of the pull requests? Neither #29 or #26 have been merged into the main code base yet. #29 I think will result in an error message, whilst #26 will give a warning and do its best to deal with the reads missing mates.

jelber2 avatar Jul 11 '22 12:07 jelber2

Did you try one of the pull requests? Neither #29 or #26 have been merged into the main code base yet. #29 I think will result in an error message, whilst #26 will give a warning and do its best to deal with the reads missing mates.

Thanks for the suggestion, I tried with #26 STAR alignments > samtools sort - by name > samtools fixmate > mudskipper worked for me. I guess removing the unpaired reads fixes the issue inherently so not sure which helped most.

Delta-43 avatar Jul 11 '22 14:07 Delta-43

I have not tried samtools fixmate with #26 (I am the author of #26). I am guessing you got warning messages though that you had pairs with a missing mate in the logs?

jelber2 avatar Jul 11 '22 14:07 jelber2

Yes, I was getting a continuous stream of these warnings - [2022-07-11T14:54:53Z WARN mudskipper::query_bam_records] Cannot find the mate for query A01330:67:H5TG2DRX2:1:1104:8422:20275. The mate might be missing or not in the right order! If the sam/bam file is not name sorted, please consider using "mudskipper shuffle".

I checked my bamfiles with GATK ValidateSamFile to confirm, and then removed the reads with missing mates. Samtools fixmate removed about 0.6% of the reads, and mudskipper was happy.

I won't be able to confirm how much this affects the downstream analysis till later though.

Edit: Sorry, I made a slight mistake samtools fixmate doesn't remove the reads by default, but creates extra flags for those reads missing the mates. It fills in mate info by pseudo reads I think. You can optionally remove those reads.

Delta-43 avatar Jul 11 '22 15:07 Delta-43

@Delta-43 I guess the question is more of a personal one- do you also want to keep singletons (not what you workflow that you describe is doing - it would do this if you just used STAR alignments --outSAMtype BAM Unsorted > #26 mudskipper or samtools sort -n > #26 mudskipper or do you want to keep only instances where both reads of a pair map (what your workflow is doing).

jelber2 avatar Jul 12 '22 06:07 jelber2

This should be fixed via PR #29, which was just merged. I will keep this issue open for testing and discussion until this makes it into a tagged release.

rob-p avatar Aug 17 '22 20:08 rob-p