usegalaxy-playbook icon indicating copy to clipboard operation
usegalaxy-playbook copied to clipboard

Htseq_count can fail with RNA STAR input

Open jennaj opened this issue 7 years ago • 6 comments

Tracking ticket Once the problem is resolved and Main updated (as needed) we can close this out.

Workaround Use HISAT2 instead of RNA STAR.

Example error

Fatal error: Unknown error occured
[bam_sort_core] merging from 32 files...
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
500000 GFF lines processed.
600000 GFF lines processed.
700000 GFF lines processed.
741207 GFF lines processed.
Error occured when processing SAM input (record #894 in file name_sorted_alignment.bam):
  unsigned byte integer is less than minimum
  [Exception type: OverflowError, raised in csamtools.pyx:2308]

Potentially the root issue: https://www.biostars.org/p/147487/

Comments from @natefoo:

I believe the message is coming from the version of pysam in use by htseq (not samtools as used by the tool or pysam in the Galaxy framework). But it looks like we are using the latest htseq dependency supported by the IUC tool, 0.6.1.post1 (even though we're still using the tool from Lance's repo):

https://github.com/galaxyproject/tools-iuc/blob/6f82cbc16053cecdf58d15a8d0fcdeac7991abaf/tools/htseq_count/htseq-count.xml#L4

I'd pass this on to the IUC to see if they have any ideas.

jennaj avatar Jan 25 '18 21:01 jennaj

@davebx is updating to htseq-count 0.9.1, which hopefully fixes it (or at least it's worth testing once it's updated).

natefoo avatar Mar 19 '18 17:03 natefoo

Test history: https://usegalaxy.org/u/jen/h/test-history-rnastar (includes updated star 2.5.2b-0 + htseq 0.9.1)

I don't think this exact test will capture the specific error above -- and we don't have an example of the inputs that trigger this (why there was no original test history, end-user deleted before I could get it back in Jan) -- but we can watch for it being reported again.

I close this out once the general-usage tests finish overnight.

jennaj avatar Mar 19 '18 23:03 jennaj

I can't get HTseq to work (no features overlap - even when using HISAT2 input). Featurecounts works with the same inputs. I am using tutorial data that should work with both.

Second test history: https://usegalaxy.org/u/jen/h/test-history-cufflinks-hisat2

I'll need to troubleshoot this more.

jennaj avatar Mar 29 '18 17:03 jennaj

Trying again with the newer version of HISAT2, STAR, and complete reruns with different params. In progress, same test history as in the prior comment.

jennaj avatar Mar 29 '18 17:03 jennaj

Still a problem. Looks like the tool was updated in the MTS (bug fix) but didn't have a revision change. Could we update it to the most current MTS version and see if this problem goes away, too?

https://github.com/galaxyproject/usegalaxy-playbook/issues/124

jennaj avatar May 14 '18 20:05 jennaj

Retesting with different data - think works in some data, might be a corner-case tool bug

jennaj avatar May 16 '18 21:05 jennaj