libStatGen icon indicating copy to clipboard operation
libStatGen copied to clipboard

Retrieve unmapped reads through SetReadSection(chr,beg,end) failed!

Open Griffan opened this issue 8 years ago • 4 comments

I tried to retrieve unmapped reads via setting SetReadSection("*",-1,-1) or SetReadSection("",-1,-1), but failed in both ways. I wonder if I used it in a wrong way. I also tested it using bam file only contains unmapped reads, which results in failure at SamRecord.cpp:625 line

if(ifread(filePtr, &(myRecordPtr->myReferenceID), myRecordPtr->myBlockSize) != (unsigned int)myRecordPtr->myBlockSize) { // Error reading the record. Reset it and return failure. resetRecord(); std::string statusMsg = "Failed to read the record, "; statusMsg += filePtr->getFileName(); statusMsg += "."; myStatus.setStatus(SamStatus::FAIL_IO, statusMsg.c_str()); return(SamStatus::FAIL_IO); }

The snapshot in GDB: (gdb) p myRecordPtr->myBlockSize $13 = 21840194

Anybody experienced this problem before?

Griffan avatar Mar 06 '16 02:03 Griffan

Either of those settings should work (and did when I just tested them). Is your index valid? That block size seems very large. Can you point me to your test files or your code that is calling SetReadSection, and I'll see if I can figure out what is going wrong.

mktrost avatar Mar 06 '16 02:03 mktrost

The bam file and index: /net/wonderland/home/fanzhang/1000g/data/9.bam2Fastq/read_data_test/unmap.bam /net/wonderland/home/fanzhang/1000g/data/9.bam2Fastq/read_data_test/unmap.bam.bai

The source files lies in: /net/wonderland/home/fanzhang/1000g/data/9.bam2Fastq/gotcloud/src/bamUtil/src/MateVectorByRN.cpp

SetReadSection function called at line 165

Thanks a lot!

Griffan avatar Mar 06 '16 03:03 Griffan

Are you only running into this problem when you have a file that is just unmapped reads? That is something I hadn't thought of before - the index file contains no offset into the file. When the code asks the Bam Index file where the previous chromosome ends, it returns 0, and the code assumes unmapped starts at 0 (after the last chromosome).

So yes, I guess that is a bug - it isn't the easiest fix as it will require a change in logic. See BamIndex.cpp lines 234-244. MaxOffset it coming back as 0. We may need to adjust SamFile around 1194 to special handle if chunk_beg is 0 and we are looking for unmapped reads - in that case, it needs to jump to after the header.

I'm heading out on vacation and won't be able to make changes to fix this right now.

mktrost avatar Mar 06 '16 06:03 mktrost

I added some mapped reads into this file, and bypassed this problem. Normal bam file should be fine.

Griffan avatar Mar 06 '16 23:03 Griffan