SalmonTools icon indicating copy to clipboard operation
SalmonTools copied to clipboard

Rare Infinite Loop While Extracting Unmapped?

Open Miserlou opened this issue 7 years ago • 13 comments

Perhaps you can shed some light on top this - very occasionally, we see salmontools processes which seem to never terminate.

Here you can see some which have been operating for more than 4 hours and which are still consuming full CPU: screen shot 2018-09-23 at 1 37 31 pm

Here is the sample in question: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM2432103

Do you have any idea what might be causing this?

Sorry that this isn't a more reproducible report!

Miserlou avatar Sep 24 '18 15:09 Miserlou

Here are the 10 accession codes which had the longest jobs which successfully completed and the length of the transcriptome index we used to run it:

accession_code |     index_type      
----------------+---------------------
 SRR4423743     | TRANSCRIPTOME_SHORT
 SRR5342767     | TRANSCRIPTOME_SHORT
 SRR3666783     | TRANSCRIPTOME_SHORT
 SRR6494603     | TRANSCRIPTOME_SHORT
 SRR1524241     | TRANSCRIPTOME_LONG
 SRR4423749     | TRANSCRIPTOME_SHORT
 SRR6297667     | TRANSCRIPTOME_LONG
 SRR6877472     | TRANSCRIPTOME_LONG
 SRR4423750     | TRANSCRIPTOME_SHORT
 SRR6494612     | TRANSCRIPTOME_SHORT

These transcriptome indices can be downloaded here: https://s3.amazonaws.com/data-refinery-s3-transcriptome-index-circleci-prod/DANIO_RERIO_TRANSCRIPTOME_LONG.tar.gz

https://s3.amazonaws.com/data-refinery-s3-transcriptome-index-circleci-prod/DANIO_RERIO_TRANSCRIPTOME_SHORT.tar.gz

kurtwheeler avatar Sep 28 '18 14:09 kurtwheeler

These samples are also derived from .sra files, extracted with fasterq-dump.

Could our issue have anything to do with the bug mentioned in this unmerged pull request?

Miserlou avatar Sep 28 '18 14:09 Miserlou

To complete @rob-p's request I am tagging @hiraksarkar

cgreene avatar Sep 28 '18 14:09 cgreene

@cgreene thanks for tagging. Looking into the failure.

hiraksarkar avatar Sep 28 '18 15:09 hiraksarkar

@Miserlou are you running the SalmonTools master branch?

hiraksarkar avatar Sep 28 '18 15:09 hiraksarkar

Yes, we use git clone https://github.com/COMBINE-lab/SalmonTools.git and build that. Does your branch fix this error?

Miserlou avatar Sep 28 '18 15:09 Miserlou

Hi @Miserlou, So I forked a version and use that with some modification, as I wanted the zipped-extracted files. but the code is more or less same. https://github.com/hiraksarkar/SalmonTools is the one I use. I generally use the fastq from embl sites. Can you give me a copy of your fastq from zenodo or some other storage. Would debug with that.

PS: If possible also the unmapped_names.txt

hiraksarkar avatar Sep 28 '18 15:09 hiraksarkar

@hiraksarkar : We would also prefer the files be zipped. The next step of our process is actually to zip them. So if you and we are the only people using this functionality of SalmonTools, maybe it would make sense to bring that functionality into the main repo also?

cgreene avatar Sep 28 '18 15:09 cgreene

@cgreene, noted, I will create a pull request.

hiraksarkar avatar Sep 28 '18 15:09 hiraksarkar

Is it the same as #1?

cgreene avatar Sep 28 '18 15:09 cgreene

Yup, I guess I created this before.

hiraksarkar avatar Sep 28 '18 15:09 hiraksarkar

Tagging @rob-p as it seems I don't have write access to this repo.

hiraksarkar avatar Sep 28 '18 15:09 hiraksarkar

Here are some example files which caused the problem: https://zenodo.org/record/1438469

Here's a screenshot of our Salmon pipeline, without and with Salmontools. Salmon is taking ~5-10 minutes, Salmontools is taking multiple hours:

image

Miserlou avatar Sep 28 '18 20:09 Miserlou