gossamer icon indicating copy to clipboard operation
gossamer copied to clipboard

xenome classify hangs

Open mjafin opened this issue 7 years ago • 50 comments

Hi there, I'm testing xenome classify on an aws instance (latest Ubuntu) and it hangs after about 50 minutes. The command I used for launching the process is

xenome classify -T 8 -M 28 -P /data/Miika/idx --pairs -i dna/SRR1176814_1.fastq.gz -i dna/SRR1176814_2.fastq.gz --output-filename-prefix SRR1176814 -v > output_stats_SRR1176814.txt;

This is where it stops:

...
Tue Jan  3 10:21:59 2017        info    46700000 reads
Tue Jan  3 10:22:05 2017        info    46800000 reads
Tue Jan  3 10:22:12 2017        info    46900000 reads
Tue Jan  3 10:22:18 2017        info    47000000 reads
Tue Jan  3 10:22:24 2017        info    47100000 reads
Tue Jan  3 10:22:31 2017        info    47200000 reads
Tue Jan  3 10:22:37 2017        info    47300000 reads

The sample (SRR1176814) has 47312349 reads so it looks like it's getting to the end but then nothing happens. The process is still visible in top:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
32605 ubuntu    20   0  672200  13852   5756 S   0.0  0.0 403:05.52 xenome

output_stats_SRR1176814.txt is empty.

Any ideas?

mjafin avatar Jan 03 '17 11:01 mjafin

Far too many ideas. Need more information to narrow it down.

First off, please confirm that all the unit tests succeeded on this platform.

Secondly, can you confirm that you tried running it more than once and got the same behaviour? If the bug is intermittent, that narrows down the possibilities.

Thirdly, a little bit of information. Could you please show me the output of:

ls -l dna/SRR1176814*.gz ls -l /data/Miika/idx*

Finally, let's try to create a cut-down test case. Could you please try this?

gunzip -c dna/SRR1176814_1.fastq.gz | head -n 4000 | gzip -9 -c > dna/test_1.fastq.gz gunzip -c dna/SRR1176814_2.fastq.gz | head -n 4000 | gzip -9 -c > dna/test_2.fastq.gz

Then run xenome classify on dna/test_1 and dna/test_2 using the same options.

If that hangs too (should be much quicker), then please send us the cut-down input files. Either attach them to the ticket, or (if you can't let the public see them) email them to me.

Deguerre avatar Jan 04 '17 00:01 Deguerre

Unit tests were fine when I compiled gossamer.

I ran several samples yesterday and all showed the same behaviour.

Here's the ls output:

ls -l dna/SRR1176814*.gz
-rw-rw-r-- 1 ubuntu ubuntu 4541431597 Jan  1 11:15 dna/SRR1176814_1.fastq.gz
-rw-rw-r-- 1 ubuntu ubuntu 4261961919 Jan  1 11:16 dna/SRR1176814_2.fastq.gz

and

ls -l /data/Miika/idx*
-rw-rw-r-- 1 ubuntu ubuntu         24 Jan  1 20:50 /data/Miika/idx-both.header
-rw-rw-r-- 1 ubuntu ubuntu  169401184 Jan  1 20:50 /data/Miika/idx-both.kmers-d0
-rw-rw-r-- 1 ubuntu ubuntu  291033728 Jan  1 20:50 /data/Miika/idx-both.kmers-d1
-rw-rw-r-- 1 ubuntu ubuntu         64 Jan  1 20:50 /data/Miika/idx-both.kmers.header
-rw-rw-r-- 1 ubuntu ubuntu 1095959376 Jan  1 20:50 /data/Miika/idx-both.kmers.high-bits
-rw-rw-r-- 1 ubuntu ubuntu 8945415320 Jan  1 20:50 /data/Miika/idx-both.kmers.low-bits.lwr
-rw-rw-r-- 1 ubuntu ubuntu 4472707660 Jan  1 20:50 /data/Miika/idx-both.kmers.low-bits.upr
-rw-rw-r-- 1 ubuntu ubuntu  559088464 Jan  2 12:39 /data/Miika/idx-both.lhs-bits
-rw-rw-r-- 1 ubuntu ubuntu  559088464 Jan  2 12:39 /data/Miika/idx-both.rhs-bits
-rw-rw-r-- 1 ubuntu ubuntu         24 Jan  1 17:05 /data/Miika/idx-graft.header
-rw-rw-r-- 1 ubuntu ubuntu   92224224 Jan  1 17:05 /data/Miika/idx-graft.kmers-d0
-rw-rw-r-- 1 ubuntu ubuntu  141325168 Jan  1 17:05 /data/Miika/idx-graft.kmers-d1
-rw-rw-r-- 1 ubuntu ubuntu         64 Jan  1 17:05 /data/Miika/idx-graft.kmers.header
-rw-rw-r-- 1 ubuntu ubuntu  565383992 Jan  1 17:05 /data/Miika/idx-graft.kmers.high-bits
-rw-rw-r-- 1 ubuntu ubuntu 4751176496 Jan  1 17:05 /data/Miika/idx-graft.kmers.low-bits.lwr
-rw-rw-r-- 1 ubuntu ubuntu 2375588248 Jan  1 17:05 /data/Miika/idx-graft.kmers.low-bits.upr
-rw-rw-r-- 1 ubuntu ubuntu         24 Jan  1 19:33 /data/Miika/idx-host.header
-rw-rw-r-- 1 ubuntu ubuntu   80819808 Jan  1 19:33 /data/Miika/idx-host.kmers-d0
-rw-rw-r-- 1 ubuntu ubuntu  143517136 Jan  1 19:33 /data/Miika/idx-host.kmers-d1
-rw-rw-r-- 1 ubuntu ubuntu         64 Jan  1 19:33 /data/Miika/idx-host.kmers.header
-rw-rw-r-- 1 ubuntu ubuntu  532106144 Jan  1 19:33 /data/Miika/idx-host.kmers.high-bits
-rw-rw-r-- 1 ubuntu ubuntu 4218730922 Jan  1 19:33 /data/Miika/idx-host.kmers.low-bits.lwr
-rw-rw-r-- 1 ubuntu ubuntu 2109365461 Jan  1 19:33 /data/Miika/idx-host.kmers.low-bits.upr

Building the index using 8 cores took almost a day but it did finish OK.

I ran a test using the 1000 first reads as you suggested and it finished OK - odd. Could this have something to do with how things are parallelised and communication between the threads? Here's the command I used:

xenome classify -T 8 -M 28 -P /data/Miika/idx --pairs -i dna/test_1.fastq.gz -i dna/test_2.fastq.gz --output-filename-prefix test -v > test.txt

mjafin avatar Jan 04 '17 10:01 mjafin

As this is all public data, if you're interested you can download the fastq.gz files here https://www.ebi.ac.uk/ena/data/view/SRR1176814

It looks like all the output fastq files are correctly produced though so I was able to pull together the stats I needed for my comparison in v2 of https://f1000research.com/articles/5-2741/v1 - results are very similar to our alignment based algorithm.

mjafin avatar Jan 04 '17 10:01 mjafin

Same issue here, also tried to use a small fastq and still did not work. I also tried different options, e.g., single thread, single input file, same issue. Can some one look into this issue? test_01.fastq.gz test_02.fastq.gz

Thanks

billnjcn111 avatar Jan 04 '17 15:01 billnjcn111

Thanks for that. The information from ls also ruled out the old gzipped-file-is-an-exact-multiple-of-the-io-buffer-size issue that we found in a very old version of the gzip filter.

I suspect it's the job manager. We'll take a look.

Deguerre avatar Jan 05 '17 05:01 Deguerre

Did you find any clue yet? Thanks

billnjcn111 avatar Jan 12 '17 16:01 billnjcn111

Same issue here, xenome classify hangs after the work is finished.

zz2liu avatar Jan 20 '17 16:01 zz2liu

I just got back from holidays. Picking this up again.

Deguerre avatar Jan 27 '17 03:01 Deguerre

Deguerre, How is it going? Have you found any clue how to fix the bug? ThX

billnjcn111 avatar Feb 17 '17 21:02 billnjcn111

I see the same problem with my data, xenome classify just hangs after all output file are generated. Does anyone have updates on the topic?

Thanks

danielgerlach avatar Mar 28 '17 11:03 danielgerlach

Anything? I'm also seeing the same problem here.

serverhorror avatar May 08 '17 11:05 serverhorror

Yes, I get the same issue. It did not happen when I tested xenome on a small test sample of reads (~4000), but when I ran it on all my samples (tens of millions of reads per sample) then it hangs after completion (or what looks like completion).

murphycj avatar Jun 28 '17 17:06 murphycj

Just as an update, this is turning out to be a very nasty problem caused by a mismatch between two different threading models. We decided that for the open source release we should use the standard C++ threading system rather than our previous solution which we couldn't easily maintain. The hanging is caused by some of the old code relying on some detail in the previous model that nobody can remember because it was written so long ago.

Only the kmer set construction in Xenome seems to be affected. Everything else seems to work.

None of the Gossamer authors are being paid to work on this, so we have to work on it around our day jobs. As you all probably have worked out, the problem only happens on large examples, which means each individual test takes a while.

I am only speaking on behalf of myself, but I'm sure the other authors agree that we're very sorry about this, and we all want to get this finished as quickly as possible so everyone can use it. Please bear with us.

Deguerre avatar Jun 28 '17 23:06 Deguerre

Thanks for working toward fixing this!

murphycj avatar Jun 29 '17 12:06 murphycj

I am facing the same issue. Has this bug been fixed now?

kannabirannandakumar avatar Aug 24 '17 01:08 kannabirannandakumar

I'm running xenome classify for a week on macOS Sierra (10.12.6). Is this macOS specific issue?

bonohu avatar Aug 24 '17 02:08 bonohu

I'm also experiencing the same issue and looking forward to the next fix.

In the meantime I'm attempting to whittle down the original read files to see how large of a file will still work w/out hanging. I'm down to 0.5% (yes, half of a percent) of the original; this equates to 2 read files around 150 mb uncompressed and still hanging.

Just curious what the largest file anyone has been able to run w/out hanging, and on what set up?

jasonwork9941 avatar Sep 11 '17 20:09 jasonwork9941

Same issue. I've been using your Xenomes program for a patient xenograft sample. Your algorithm works well, however, I've been running it on bsub, and it seems to be running forever, even though the files have been output and considered finished. For example, if I have a 100,000 read fastq that has been input to xenome, it takes about a day to complete the indexing, and outputting the different .fastq is relatively quick after indexing. I checked the files and the added reads of all five of the files (ambiguous, neither, both, mouse, and human) add up to 100000 almost immediately after the files are created (i would say max 20 minutes), but for some reason the program still runs forever and ever. After two days, the jobs are still "running".

To be honest, I don't care much, and willing to write code around it to make sure that the files add up to the original read count. As long as the output is accurate. Can anyone confirm that their output is accurate? For mine it seems that Xenomes reaches the target accuracy, so i'm assuming that once xenome has "finished" the output is accurate and considered done.

maheetha avatar Oct 13 '17 21:10 maheetha

I can confirm too. Classify after running about 20 minutes or so hangs. Logs show,in my case, processing of ~10million reads classify does not advance. Interestingly, input fastq reads match ouput reads. I have to kill each process after 30 min. Program executes its job but cannot finish it.

Indexing also took indefinetly long. I played with -M and -T parameters and made it to work in about 8 hrs. Although I could go over 124GB memory on cluster with 16 threads I used 64GB and 12 threads to make it to work. This is arbitrary with no explanation. I wish documentation is more detailed and clear enough. But xenome works in gossamer, i can move on to next step.

obwan74 avatar Jan 16 '18 12:01 obwan74

(My apologies for now deleted post, I used -I io -i for classify and fastq data - was not reporting error and just hanging)

I could now run a classify job but after processing the 2000000 reads it does not exit. Can I safely kill it manually?

splaisan avatar Mar 04 '18 11:03 splaisan

I found a work around:

  • run xenome for the first batch of your data
  • estimate the running time by 'xenome' dissapearing from you 'top'
  • for each of the rest of your data: timeout xenome classify ...

Hope it helps.

zz2liu avatar May 08 '18 16:05 zz2liu

It doesn't exit at all. We've stopped using it since it didn't work for us.

serverhorror avatar May 28 '18 11:05 serverhorror

true but it does the job in our case, what you lack are a happy end of the run and the stats. We compared it to other tools and found that it does what it it expected to do, only a pity that the developer do not put time in fixing the exit issue.

splaisan avatar May 28 '18 11:05 splaisan

Hello,

I'm facing a similar issue. Im running xenome classify on my computer but it hangs/ or takes a really long time to run.

xenome classify -v -T 8 -M 8 -P ../../reference_sequence/xenome_idx/idx -i sample_R1.fastq.gz --output-filename-prefix sample_merged
Fri Jun 29 12:20:01 2018	info	opening buffer 0 /var/folders/mp/xd5y68q53zjdvvr81k8d30g40000gp/T//1530300001-47497-0-classbuf-0
Fri Jun 29 12:20:01 2018	info	performing 2 passes
Fri Jun 29 12:20:01 2018	info	parsing sequences from 44279_11_merged_R1.fastq.gz
Fri Jun 29 12:20:01 2018	info	writing to 44279_11_merged_neither.fastq
Fri Jun 29 12:20:01 2018	info	writing to 44279_11_merged_both.fastq
Fri Jun 29 12:20:01 2018	info	writing to 44279_11_merged_graft.fastq
Fri Jun 29 12:20:01 2018	info	writing to 44279_11_merged_host.fastq
Fri Jun 29 12:20:01 2018	info	writing to 44279_11_merged_ambiguous.fastq
Fri Jun 29 12:20:01 2018	info	pass 0
Fri Jun 29 12:20:01 2018	info	parsing sequences from 44279_11_merged_R1.fastq.gz

It seems to be stuck. The process is visible in top but shows up as sleeping.

Processes: 366 total, 2 running, 5 stuck, 359 sleeping, 1657 threads                       12:26:52
Load Avg: 2.02, 2.04, 2.14  CPU usage: 3.14% user, 2.17% sys, 94.68% idle
SharedLibs: 172M resident, 35M data, 12M linkedit.
MemRegions: 114419 total, 2116M resident, 64M private, 3151M shared.
PhysMem: 8167M used (1922M wired), 22M unused.
VM: 1646G vsize, 1097M framework vsize, 102331040(0) swapins, 106855192(0) swapouts.
Networks: packets: 409129203/172G in, 607434675/394G out.
Disks: 98600050/3125G read, 21012651/1205G written.

PID    COMMAND      %CPU TIME     #TH   #WQ  #PORT MEM    PURG   CMPRS  PGRP  PPID  STATE
97228  xenome       0.0  06:17.01 7     0    19    12K    0B     4436K  97228 50512 sleeping

I'd really appreciate your help on this!

rushikapandya avatar Jun 29 '18 19:06 rushikapandya

I do not think we will ever get help on this... In our experience the job is done at that stage. You can kill it and check that the sum of the number of reads in all outputs matches the input. What we mis is the summary table but you can produce it too from thje outputs.

splaisan avatar Jun 29 '18 21:06 splaisan

Hello,

Thank you for your reply. But unfortunately my reads don't add up. I tried running it again using different -M and -T settings and it seems to be running though very slowly!!!

rushikapandya avatar Jul 03 '18 19:07 rushikapandya

U would use N threads and at least 4g ram per thread? Better 6 to 8 if you have homo+mus data. Also  try with 10M reads first to see if that worksGood luck Sent from my smartphone. -------- Original message --------From: Rushika Pandya [email protected] Date: 7/3/18 21:55 (GMT+01:00) To: data61/gossamer [email protected] Cc: Stephane Plaisance [email protected], Comment [email protected] Subject: Re: [data61/gossamer] xenome classify hangs (#9) Hello, Thank you for your reply. But unfortunately my reads don't add up. I tried running it again using different -M and -T settings and it seems to be running though very slowly!!!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread. {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/data61/gossamer","title":"data61/gossamer","subtitle":"GitHub repository","main_image_url":"https://assets-cdn.github.com/images/email/message_cards/header.png","avatar_image_url":"https://assets-cdn.github.com/images/email/message_cards/avatar.png","action":{"name":"Open in GitHub","url":"https://github.com/data61/gossamer"}},"updates":{"snippets":[{"icon":"PERSON","message":"@rushikapandya in #9: Hello,\r\n\r\nThank you for your reply. But unfortunately my reads don't add up. I tried running it again using different -M and -T settings and it seems to be running though very slowly!!!"}],"action":{"name":"View Issue","url":"https://github.com/data61/gossamer/issues/9#issuecomment-402274610"}}} [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/data61/gossamer/issues/9#issuecomment-402274610", "url": "https://github.com/data61/gossamer/issues/9#issuecomment-402274610", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } }, { "@type": "MessageCard", "@context": "http://schema.org/extensions", "hideOriginalBody": "false", "originator": "AF6C5A86-E920-430C-9C59-A73278B5EFEB", "title": "Re: [data61/gossamer] xenome classify hangs (#9)", "sections": [ { "text": "", "activityTitle": "Rushika Pandya", "activityImage": "https://assets-cdn.github.com/images/email/message_cards/avatar.png", "activitySubtitle": "@rushikapandya", "facts": [

] } ], "potentialAction": [ { "name": "Add a comment", "@type": "ActionCard", "inputs": [ { "isMultiLine": true, "@type": "TextInput", "id": "IssueComment", "isRequired": false } ], "actions": [ { "name": "Comment", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n"commandName": "IssueComment",\n"repositoryFullName": "data61/gossamer",\n"issueId": 9,\n"IssueComment": "{{IssueComment.value}}"\n}" } ] }, { "name": "Close issue", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n"commandName": "IssueClose",\n"repositoryFullName": "data61/gossamer",\n"issueId": 9\n}" }, { "targets": [ { "os": "default", "uri": "https://github.com/data61/gossamer/issues/9#issuecomment-402274610" } ], "@type": "OpenUri", "name": "View on GitHub" }, { "name": "Unsubscribe", "@type": "HttpPOST", "target": "https://api.github.com", "body": "{\n"commandName": "MuteNotification",\n"threadId": 191247166\n}" } ], "themeColor": "26292E" } ]

splaisan avatar Jul 04 '18 08:07 splaisan

Does anybody know of a workaround? I installed Xenome in a Docker container (Ubuntu 16.04) and indexing worked fine but the classify step hangs after parsing all reads without any particular message just like it did for all of you. I tried with FASTQ files containing either 25k or 2 million reads and both failed. Any alternative tool to use or other ideas?

romanhaa avatar Jul 04 '18 16:07 romanhaa

I have an older version of xenome that does not have this issue. Let me know if you need it, I can share.

On Wed, Jul 4, 2018 at 12:07 PM romanhaa [email protected] wrote:

Does anybody know of a workaround? I installed Xenome in a Docker container and indexing worked fine but the classify step hangs without any particular message just like it did for all of you. I tried with FASTQ files containing either 25k or 2 million reads and both failed. Any alternative tool to use or other ideas?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/data61/gossamer/issues/9#issuecomment-402518222, or mute the thread https://github.com/notifications/unsubscribe-auth/ASoqMoCtMcclEoiTXdXqbCDTbc2k0PZdks5uDOg_gaJpZM4LZjM- .

kannabirannandakumar avatar Jul 04 '18 20:07 kannabirannandakumar

@kannabirannandakumar yes that would be fantastic!

romanhaa avatar Jul 04 '18 20:07 romanhaa