mudskipper
mudskipper copied to clipboard
'main' panicked at 'index out of bounds: the len is 1 but the index is 1'
Built the index first using gencode v40
mudskipper index --gtf /SAN/vyplab/vyplab_reference_genomes/annotation/human/GRCh38/gencode.v40.annotation.gtf --dir-index /SAN/vyplab/vyplab_reference_genomes/mudskipper_indices/human.gencode.v40
And then tried to convert with
mudskipper bulk --index /SAN/vyplab/vyplab_reference_genomes/mudskipper_indices/human.gencode.v40/ --alignment my.sorted.bam --out my.mud.bam --threads 4
However this freaks out immediately with:
[2022-05-02T12:43:07Z INFO mudskipper] Mudskipper started...
[2022-05-02T12:43:07Z INFO mudskipper::annotation] Loading parsed GTF...
[2022-05-02T12:43:08Z INFO mudskipper::bam] Number of reference sequences: 246624
[2022-05-02T12:43:09Z INFO mudskipper::bam] thread count: 4 in total
[2022-05-02T12:43:09Z INFO mudskipper::bam] thread count: 2 for reading
[2022-05-02T12:43:09Z INFO mudskipper::bam] thread count: 2 for writing
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/query_bam_records.rs:165:33
stack backtrace:
0: 0x5599e7546c2d - std::backtrace_rs::backtrace::libunwind::trace::hee598835bc88d35b
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
1: 0x5599e7546c2d - std::backtrace_rs::backtrace::trace_unsynchronized::h9cdc730ba5cf5d72
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x5599e7546c2d - std::sys_common::backtrace::_print_fmt::h75aeaf7ed30e43fa
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:66:5
3: 0x5599e7546c2d - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h606862f787600875
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:45:22
4: 0x5599e7569a9c - core::fmt::write::he803f0f418caf762
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/fmt/mod.rs:1190:17
5: 0x5599e7542fc8 - std::io::Write::write_fmt::h70bc45872f37e7bb
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/io/mod.rs:1657:15
6: 0x5599e7548cd7 - std::sys_common::backtrace::_print::h64d038cf8ac3e13e
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:48:5
7: 0x5599e7548cd7 - std::sys_common::backtrace::print::h359300b4a7fccf65
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:35:9
8: 0x5599e7548cd7 - std::panicking::default_hook::{{closure}}::hf51be35e2f510149
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:295:22
9: 0x5599e75489a0 - std::panicking::default_hook::h03ca0f22e1d2d25e
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:314:9
10: 0x5599e7549429 - std::panicking::rust_panic_with_hook::h3b7380e99b825b63
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:698:17
11: 0x5599e7549117 - std::panicking::begin_panic_handler::{{closure}}::h8e849d0710154ce0
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:588:13
12: 0x5599e75470f4 - std::sys_common::backtrace::__rust_end_short_backtrace::hedcdaddbd4c46cc5
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/sys_common/backtrace.rs:138:18
13: 0x5599e7548e29 - rust_begin_unwind
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:584:5
14: 0x5599e737dcb3 - core::panicking::panic_fmt::he1bbc7336d49a357
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:143:14
15: 0x5599e737dbf2 - core::panicking::panic_bounds_check::h7777e18766a27b0b
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:85:5
16: 0x5599e73a4d1f - mudskipper::query_bam_records::BAMQueryRecordReader::group_records::h3e557bbfbf98c5b3
17: 0x5599e73a55cd - mudskipper::query_bam_records::BAMQueryRecordReader::get_next_query_records::hd44f4b13b6418ce6
18: 0x5599e73946dd - mudskipper::bam::bam2bam::h107271027deeb930
19: 0x5599e739069c - mudskipper::main::hd9af9ad03c0768b5
20: 0x5599e7393d23 - std::sys_common::backtrace::__rust_begin_short_backtrace::h6d04830762f92fce
21: 0x5599e73a6249 - std::rt::lang_start::{{closure}}::h3a11e2c99d3705be
22: 0x5599e7546311 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::hb7014f43484a8b4e
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/ops/function.rs:259:13
23: 0x5599e7546311 - std::panicking::try::do_call::h7bc9dc436daeb8c7
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:492:40
24: 0x5599e7546311 - std::panicking::try::h653d68a27ff5f175
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:456:19
25: 0x5599e7546311 - std::panic::catch_unwind::h9d739f9f59895e68
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panic.rs:137:14
26: 0x5599e7546311 - std::rt::lang_start_internal::{{closure}}::hf006f2bc7ce22bbe
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/rt.rs:128:48
27: 0x5599e7546311 - std::panicking::try::do_call::hfb39d6df61a2e69f
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:492:40
28: 0x5599e7546311 - std::panicking::try::h13e2d225134958ac
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:456:19
29: 0x5599e7546311 - std::panic::catch_unwind::h3bd49b5a5dfb1a50
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panic.rs:137:14
30: 0x5599e7546311 - std::rt::lang_start_internal::h2ba92edce36c035e
at /rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/rt.rs:128:20
31: 0x5599e7391632 - main
32: 0x2b9c394ed555 - __libc_start_main
33: 0x5599e737e4e9 - <unknown>
34: 0x0 - <unknown>
I get the same error on a few other BAMs I've tried as well
And the same error persist if I do this
mudskipper bulk --gtf /SAN/vyplab/vyplab_reference_genomes/annotation/human/GRCh38/gencode.v40.annotation.gtf --alignment my.sorted.bam --out my.mud.bam --threads 4
Hi @aleighbrown,
Thank you for reporting this issue! Would you be able to share the BAM file that is causing this issue, or at least a small subset of it? This will help greatly in debugging the problem.
Thanks, Rob
I am also getting this error - see details below. Hope this helps with debugging.
rust setup
rustup show
Default host: x86_64-unknown-linux-gnu
rustup home: /nfs/scistore16/itgrp/jelbers/.rustup
installed toolchains
--------------------
stable-x86_64-unknown-linux-gnu (default)
nightly-x86_64-unknown-linux-gnu
active toolchain
----------------
stable-x86_64-unknown-linux-gnu (default)
rustc 1.60.0 (7737e0b5c 2022-04-04)
mudskipper installation
cd ~/git/
git clone https://github.com/OceanGenomics/mudskipper
cd mudskipper
git show
commit f6382774aa84332719cd545cffd776604c2bc79a (HEAD -> main, origin/main, origin/HEAD)
cargo build --release
get GTF file and genome
wget ftp://ftp.ensemblgenomes.org/pub/metazoa/release-46/gtf/caenorhabditis_elegans/Caenorhabditis_elegans.WBcel235.46.gtf.gz
gunzip Caenorhabditis_elegans.WBcel235.46.gtf.gz
wget ftp://ftp.ensemblgenomes.org/pub/metazoa/release-46/fasta/caenorhabditis_elegans/dna/Caenorhabditis_elegans.WBcel235.dna.toplevel.fa.gz
gunzip Caenorhabditis_elegans.WBcel235.dna.toplevel.fa.gz
STAR index genome
module load seqtk/20210125
# make sequences upper case (remove soft-masking)
seqtk seq -CU Caenorhabditis_elegans.WBcel235.dna.toplevel.fa > Caenorhabditis_elegans.WBcel235.dna.toplevel_uppercase.fa
module load STAR/2.7.9a
mkdir -p Caenorhabditis_elegans.WBcel235.dna.toplevel_Ensembl46_STAR-2.7.9a_sjdbOverhang18
STAR \
--genomeSAindexNbases 12 \
--limitGenomeGenerateRAM 100000000000 \
--runThreadN 34 \
--sjdbOverhang 18 \
--runMode genomeGenerate \
--genomeDir Caenorhabditis_elegans.WBcel235.dna.toplevel_Ensembl46_STAR-2.7.9a_sjdbOverhang18 \
--genomeFastaFiles Caenorhabditis_elegans.WBcel235.dna.toplevel_uppercase.fa \
--sjdbGTFfile Caenorhabditis_elegans.WBcel235.46.gtf
STAR mapping
module load STAR/2.7.9a
STAR --runThreadN 32 \
--genomeDir auxData/Caenorhabditis_elegans.WBcel235.dna.toplevel_Ensembl46_STAR-2.7.9a_sjdbOverhang18 \
--readFilesCommand zcat \
--outSAMtype BAM SortedByCoordinate \
--outFileNamePrefix mapped/ref/RNAseq.MUT1.rep4 \
--outTmpDir TMP/ref_RNAseq.MUT1.rep4/TEMP/ \
--outReadsUnmapped Fastx \
--readFilesIn ../../analysis.1/quantification/fastq/RNAseq.MUT1.rep4-trimmed.mate1.fastq.gz \
../../analysis.1/quantification/fastq/RNAseq.MUT1.rep4-trimmed.mate2.fastq.gz
Get only first two-alignments and header
samtools view -h mapped/ref/RNAseq.MUT1.rep4Aligned.sortedByCoord.out.bam|head -n 13|samtools view -Sb > test.bam
mudskipper index GTF
export PATH="/nfs/scistore16/itgrp/jelbers/git/mudskipper/target/release:$PATH"
mudskipper index --gtf auxData/Caenorhabditis_elegans.WBcel235.46.gtf --dir-index auxData/mudskipper
[2022-05-12T06:02:52Z INFO mudskipper] Mudskipper started...
[2022-05-12T06:02:52Z INFO mudskipper::annotation] reading the gtf file and building the tree.
[2022-05-12T06:02:59Z INFO mudskipper::annotation] building the tree
[2022-05-12T06:02:59Z INFO mudskipper::annotation] Time to build the tree: 7.234329464s
[2022-05-12T06:02:59Z INFO mudskipper::annotation] saving the GTF index
[2022-05-12T06:03:03Z INFO mudskipper::annotation] Done with saving the GTF index
[2022-05-12T06:03:03Z INFO mudskipper] Mudskipper finished.
mudskipper bulk
mudskipper bulk --threads 32 --alignment test.bam --index auxData/mudskipper --out alignments.bam
[2022-05-12T06:14:27Z INFO mudskipper] Mudskipper started...
[2022-05-12T06:14:27Z INFO mudskipper::annotation] Loading parsed GTF...
[2022-05-12T06:14:28Z INFO mudskipper::bam] Number of reference sequences: 61451
[2022-05-12T06:14:28Z INFO mudskipper::bam] thread count: 32 in total
[2022-05-12T06:14:28Z INFO mudskipper::bam] thread count: 16 for reading
[2022-05-12T06:14:28Z INFO mudskipper::bam] thread count: 16 for writing
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/query_bam_records.rs:165:33
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
zip up files for help
zip mudskipper-help.zip test.bam auxData/mudskipper/ -r
download zip file with wget
wget https://www.dropbox.com/s/5f9vrzs2yxx8yvn/mudskipper-help.zip
Awesome! Thanks for the reproducibility data. We'll take a look at what could be going on here. Cc @HosseinAsghari @gmarcais.
I'm seeing the same thing, but with index out of bounds: the len is 3 but the index is 3', src/query_bam_records.rs:165:33
Hi, I've encountered same issue.
hread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/query_bam_records.rs:165:33
Used version is mudskipper 0.1.0
Same error here. Using mudskipper commit 24f1597. Paired-end 150 bp. Log with export RUST_BACKTRACE=full):
bin/mudskipper bulk --gtf transcripts-polya.gtf --alignment test.bam --out test.toTranscriptome.bam --max-softclip 50 --threads 12 --max_mem_mb 30000
[2022-06-10T21:14:44Z INFO mudskipper] Mudskipper started...
[2022-06-10T21:14:44Z INFO mudskipper::annotation] Loading parsed GTF...
[2022-06-10T21:14:44Z INFO mudskipper::bam] Number of reference sequences: 6016 [2022-06-10T21:14:44Z INFO mudskipper::bam] thread count: 12 in total
[2022-06-10T21:14:44Z INFO mudskipper::bam] thread count: 6 for reading [2022-06-10T21:14:44Z INFO mudskipper::bam] thread count: 6 for writing
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/query_bam_records.rs:165:33 stack backtrace:
0: 0x55bedc47a3ad - std::backtrace_rs::backtrace::libunwind::trace::h22893a5306c091b4 at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
1: 0x55bedc47a3ad - std::backtrace_rs::backtrace::trace_unsynchronized::h29c3bc6f9e91819d at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x55bedc47a3ad - std::sys_common::backtrace::_print_fmt::he497d8a0ec903793 at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys_common/backtrace.rs:66:5
3: 0x55bedc47a3ad - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h9c2a9d2774d81873 at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys_common/backtrace.rs:45:22
4: 0x55bedc49c5ac - core::fmt::write::hba4337c43d992f49 at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/fmt/mod.rs:1194:17 5: 0x55bedc477661 - std::io::Write::write_fmt::heb73de6e02cfabed
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/io/mod.rs:1655:15 6: 0x55bedc47be95 - std::sys_common::backtrace::_print::h63c8b24acdd8e8ce
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys_common/backtrace.rs:48:5 7: 0x55bedc47be95 - std::sys_common::backtrace::print::h426700d6240cdcc2
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys_common/backtrace.rs:35:9 8: 0x55bedc47be95 - std::panicking::default_hook::{{closure}}::hc9a76eed0b18f82b
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:295:22 9: 0x55bedc47bb49 - std::panicking::default_hook::h2e88d02087fae196
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:314:9 10: 0x55bedc47c3e2 - std::panicking::rust_panic_with_hook::habfdcc2e90f9fd4c
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:698:17 11: 0x55bedc47c2c7 - std::panicking::begin_panic_handler::{{closure}}::he054b2a83a51d2cd
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:588:13 12: 0x55bedc47a864 - std::sys_common::backtrace::__rust_end_short_backtrace::ha48b94ab49b30915
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/sys_common/backtrace.rs:138:18 13: 0x55bedc47bff9 - rust_begin_unwind
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:584:5 14: 0x55bedc26b333 - core::panicking::panic_fmt::h366d3a309ae17c94
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:143:14 15: 0x55bedc26b272 - core::panicking::panic_bounds_check::he579b0a9d71a3102
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:84:5 16: 0x55bedc2a8188 - mudskipper::query_bam_records::BAMQueryRecordReader::group_records::h8962a44cdbf203a1
17: 0x55bedc2a885c - mudskipper::query_bam_records::BAMQueryRecordReader::get_next_query_records::h87197d7e3d4e4f48 18: 0x55bedc2a494e - mudskipper::bam::bam2bam::hd8140d27dc98da38
19: 0x55bedc283b4b - mudskipper::main::h7c9075d03ee11060 20: 0x55bedc2a3063 - std::sys_common::backtrace::__rust_begin_short_backtrace::h0397ee588593813a
21: 0x55bedc2a34d9 - std::rt::lang_start::{{closure}}::hae4197f529e896bb
22: 0x55bedc4721be - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::had4f69b3aefb47a8
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:259:13
23: 0x55bedc4721be - std::panicking::try::do_call::hf2ad5355fcafe775
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:492:40
24: 0x55bedc4721be - std::panicking::try::h0a63ac363423e61e
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:456:19
25: 0x55bedc4721be - std::panic::catch_unwind::h18088edcecb8693a
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panic.rs:137:14
26: 0x55bedc4721be - std::rt::lang_start_internal::{{closure}}::ha7dad166dc711761
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/rt.rs:128:48
27: 0x55bedc4721be - std::panicking::try::do_call::hda0c61bf3a57d6e6
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:492:40
28: 0x55bedc4721be - std::panicking::try::hbc940e68560040a9
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:456:19
29: 0x55bedc4721be - std::panic::catch_unwind::haed0df2aeb3fa368
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panic.rs:137:14
30: 0x55bedc4721be - std::rt::lang_start_internal::h9c06694362b5b80c
at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/rt.rs:128:20
31: 0x55bedc2850d2 - main
32: 0x7f20915b9f4a - __libc_start_main
33: 0x55bedc26b5da - _start
at /home/abuild/rpmbuild/BUILD/glibc-2.26/csu/../sysdeps/x86_64/start.S:120
34: 0x0 - <unknown>
Test data here: https://www.dropbox.com/s/st1qd1jlx9r8kbf/mudskipper-issue24.tar.gz?dl=0
Pull request #26 seems to help. EDIT: #26 doesn't completely fix this issue.
I have the same error:
`➜ ~ mudskipper bulk --index mudskipper_gencode_v37 --alignment 66K00047.mapped.bam --out 66K00047.transc.bam
[2022-06-13T01:52:08Z INFO mudskipper] Mudskipper started...
[2022-06-13T01:52:08Z INFO mudskipper::annotation] Loading parsed GTF...
[2022-06-13T01:52:09Z INFO mudskipper::bam] Number of reference sequences: 234552
[2022-06-13T01:52:10Z INFO mudskipper::bam] thread count: 1
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/query_bam_records.rs:165:33
note: run with RUST_BACKTRACE=1 environment variable to display a backtrace`
It seems the error happens when a paired-end SAM/BAM is not name sorted and/or at least one of the paired reads is missing one of the mates. The mate doesn't have to be aligned but has to be present in the SAM/BAM file.
is perhaps the problem if there is not an even number of alignments per pair? I notice that you are getting 'len 3' rather than 'len 1'. This could be a problem if there are multimappers.
On Thu, 16 Jun 2022, 03:44 Vitor, @.***> wrote:
@opplatek https://github.com/opplatek Unfortunately I can't confirm that. I mapped reads with STAR keeping unmapped reads, so I have all the reads in the BAM file. I sorted the reads by name with samtools sort -n. I still have the same error. If I extract only reads mapped in proper pairs with samtools view -f 0x2, I still get the error.
Any suggestions?
➜ simulation git:(master) ✗ mudskipper bulk --threads 4 --index mudskipper_gencode_v37 --alignment 66K00047.sortedByName.bam --out bam_transc/66K00047.bam
[2022-06-16T02:29:41Z INFO mudskipper] Mudskipper started...
[2022-06-16T02:29:41Z INFO mudskipper::annotation] Loading parsed GTF...
[2022-06-16T02:29:43Z INFO mudskipper::bam] Number of reference sequences: 234552
[2022-06-16T02:29:44Z INFO mudskipper::bam] thread count: 4 in total
[2022-06-16T02:29:44Z INFO mudskipper::bam] thread count: 2 for reading
[2022-06-16T02:29:44Z INFO mudskipper::bam] thread count: 2 for writing
thread 'main' panicked at 'index out of bounds: the len is 3 but the index is 3', src/query_bam_records.rs:165:33
note: run with
RUST_BACKTRACE=1environment variable to display a backtrace— Reply to this email directly, view it on GitHub https://github.com/OceanGenomics/mudskipper/issues/24#issuecomment-1157170793, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJELDXILVEN3RFKLS7I2V3VPKIKJANCNFSM5U34LXAQ . You are receiving this because you commented.Message ID: @.***>
Hi @opplatek,
So taking a look at the test data (thanks for providing a reproducible example, btw!) I concur that what you are suggesting will be a problem here. For example, in the provided test data, the input records are not name sorted, and so, e.g. for read A00151:313:HWM35DSXY:4:2327:22652:26412 we see one of the alignments, followed by records for another read, and then the alignment for the mate.
We recently merged in a shuffle feature that should work to collate non name-sorted BAM files (i.e. position sorted BAM files) and also randomize the order. We can try to test that out to see if it produces a properly parsable input out of the BAM you provided.
However, I think the second condition you raise is a more fundamental one. That is, it is currently expected that if the read is marked as paired in sequencing, then it is expected that a record exists in the BAM file for the mate (even if unmapped). We should decide if this is a reasonable restriction, and if not, how best to lift it without making the parsing logic too complex.
Thanks, Rob
is perhaps the problem if there is not an even number of alignments per pair? I notice that you are getting 'len 3' rather than 'len 1'. This could be a problem if there are multimappers. … On Thu, 16 Jun 2022, 03:44 Vitor, @.> wrote: @opplatek https://github.com/opplatek Unfortunately I can't confirm that. I mapped reads with STAR keeping unmapped reads, so I have all the reads in the BAM file. I sorted the reads by name with samtools sort -n. I still have the same error. If I extract only reads mapped in proper pairs with samtools view -f 0x2, I still get the error. Any suggestions? ➜ simulation git:(master) ✗ mudskipper bulk --threads 4 --index mudskipper_gencode_v37 --alignment 66K00047.sortedByName.bam --out bam_transc/66K00047.bam [2022-06-16T02:29:41Z INFO mudskipper] Mudskipper started... [2022-06-16T02:29:41Z INFO mudskipper::annotation] Loading parsed GTF... [2022-06-16T02:29:43Z INFO mudskipper::bam] Number of reference sequences: 234552 [2022-06-16T02:29:44Z INFO mudskipper::bam] thread count: 4 in total [2022-06-16T02:29:44Z INFO mudskipper::bam] thread count: 2 for reading [2022-06-16T02:29:44Z INFO mudskipper::bam] thread count: 2 for writing thread 'main' panicked at 'index out of bounds: the len is 3 but the index is 3', src/query_bam_records.rs:165:33 note: run with
RUST_BACKTRACE=1environment variable to display a backtrace — Reply to this email directly, view it on GitHub <#24 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJELDXILVEN3RFKLS7I2V3VPKIKJANCNFSM5U34LXAQ . You are receiving this because you commented.Message ID: @.>
You are absolutely right. It seems the error generally applies to all paired mappings that have anything else but exactly two SAM/BAM lines per pair and are name sorted.
My suggestion would be to raise a warning that not all mates are present and/or that one of the mates is multimapped and then skip the pair. I don't think mudskipper should be making decisions on which mapping to keep.
I appreciate that the update by @yfei-w now prints out the ID for the first read found without a mate in the error message. It would be nice if such alignments were just skipped, with a warning message.
@VitorAguiar , as @gmarcais wrote here - https://github.com/OceanGenomics/mudskipper/pull/29#issuecomment-1170069840 , my current pull-request fails silently by treating the first read without a mate as a single-end read.
@VitorAguiar , as @gmarcais wrote here - #29 (comment) , my current pull-request fails silently by treating the first read without a mate as a single-end read.
Thanks @jelber2, I tested it out and it works for me!
@VitorAguiar , ok, now my pull request produces a warning, that there is a missing mate for some reason
I tried sorting my STAR output files by name and running mudskipper but I still get this error.
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/query_bam_records.rs:165:33
Did you try one of the pull requests? Neither #29 or #26 have been merged into the main code base yet. #29 I think will result in an error message, whilst #26 will give a warning and do its best to deal with the reads missing mates.
Did you try one of the pull requests? Neither #29 or #26 have been merged into the main code base yet. #29 I think will result in an error message, whilst #26 will give a warning and do its best to deal with the reads missing mates.
Thanks for the suggestion, I tried with #26
STAR alignments > samtools sort - by name > samtools fixmate > mudskipper
worked for me. I guess removing the unpaired reads fixes the issue inherently so not sure which helped most.
I have not tried samtools fixmate with #26 (I am the author of #26). I am guessing you got warning messages though that you had pairs with a missing mate in the logs?
Yes, I was getting a continuous stream of these warnings -
[2022-07-11T14:54:53Z WARN mudskipper::query_bam_records] Cannot find the mate for query A01330:67:H5TG2DRX2:1:1104:8422:20275. The mate might be missing or not in the right order! If the sam/bam file is not name sorted, please consider using "mudskipper shuffle".
I checked my bamfiles with GATK ValidateSamFile to confirm, and then removed the reads with missing mates. Samtools fixmate removed about 0.6% of the reads, and mudskipper was happy.
I won't be able to confirm how much this affects the downstream analysis till later though.
Edit: Sorry, I made a slight mistake samtools fixmate doesn't remove the reads by default, but creates extra flags for those reads missing the mates. It fills in mate info by pseudo reads I think. You can optionally remove those reads.
@Delta-43 I guess the question is more of a personal one- do you also want to keep singletons (not what you workflow that you describe is doing - it would do this if you just used STAR alignments --outSAMtype BAM Unsorted > #26 mudskipper or samtools sort -n > #26 mudskipper or do you want to keep only instances where both reads of a pair map (what your workflow is doing).
This should be fixed via PR #29, which was just merged. I will keep this issue open for testing and discussion until this makes it into a tagged release.