fermi icon indicating copy to clipboard operation
fermi copied to clipboard

fermi hangs on a very small dataset

Open ctSkennerton opened this issue 11 years ago • 5 comments

I've run fermi on a very small dataset containing 22 fasta records using the following cmd:

run-fermi.pl -k 200 -p cdhitout_0.85 <reads.fa>  | make -f -

however fermi hangs indefinitely. When I look at top I can see that fermi ropebwt is constantly in the sleep state:

45288 uqcskenn  20   0 24188  740  584 S    3  0.0   1:08.84 fermi ropebwt -a bcr -v3 -btf cdhitout_0.85.ec.tmp -                                                                                         
45447 uqcskenn  20   0 24188  740  584 S    2  0.0   1:08.00 fermi ropebwt -a bcr -v3 -btf cdhitout_0.90.ec.tmp - 

I've tried using both the git HEAD and with release 1.1

<reads.fa> contains:

>M00920:10:000000000-A292A:1:1101:2305:13136:1
CTTCTGGTGAAACCCACTCCCATGGTGTGACGGGCGGTGTGTACAAGACCCGGGAACGTATTCACCGCGACATGCTGATCCGCGATTACTAGCGATTCCGACTTCACGCAGTCGAGTTGCAGACTGCGATCCGGACTACGATCGGCTTTGTGAGATTCGCTCCGCCTCGCGGCTTGGCAACCCTCTGTACCGACCATTGTATGACGTGTGAAGCCCTACCCATAAGGGCCATGAGGACTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGCAGTCTCGTTAAAGTGCCCAACCAAATGATGGCAATTAACGACAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACAT
>M00920:10:000000000-A292A:1:1101:24216:16298:1
CCCTTATCCTTAGTTACCAGCACCTCGGGTGGGCACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAGGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTGCTACAATGGTCGGTACAAAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCCCATAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGTGAATCAGAATGTCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCATCACACCATGGGAGTGGGTTGCTCCAGAAGTAGCTAGTCTAACCGCAAGGGGGACGGTTACCA
>M00920:10:000000000-A292A:1:1110:4340:7240:1
CAGATTGAACGCTGGCGGCATGCTTTACACATGCAAGTCGAACGGCAGCGGGGGCTTCGGCCCGCCGGCGAGTGGCGAACGGGTGAGTAATGCATCGGAACGTACCCATGTTGTGGGGGATAACGTAGCGAAAGCTACGCTAATACCGCATAAGCCCTGAGGGGGAAAGCGGGGGATTCTTCGGAACCTCGCGCAATTGGAGCGGCCGATGTCAGATTAGCTAGTTGGTAGGGTAAAGGCCTACCAAGGCGACGATCTGTAGCGGGTCTGAGAGGATGATCCGCCACACTGGGACTGAGACACGGCCCGGACTCCTCCGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGCGCAAGGGTGATC
>M00920:10:000000000-A292A:1:1110:21042:16009:1
ACCCAGGGGGCTGCCTTCGCCATCGGTGTTCCTCCACATCTCTACGCATTTCACTGCTACACGTGGAATTCCACCCCCCTCTGCCACACTCGAGCCTTGCAGTCACAAACGCATTTCCCAGGTTAAGCCCGGGGATTTCACATCTGTCTTACAAAGCCGCCTGCGCACGCTTTACGCCCAGTAATTCCGATTAACGCTCGCACCCTACGTATTACCGCGGCTGCTGGCACGTAGTTAGCCGGTGCTTGTTCTTCAGTTCCCGTCATTGACAGTCTATGTTAGACCCCGCCGTTTCGTTCCTGCCGAAAGAGCTTTACAACCCGAAGGCCTTCTTCACTCACGCGGAATGGCTGGATCAGGGT
>M00920:10:000000000-A292A:1:1101:19922:4365:1
ATCTAATCCTGTTTGCTCCCCACGCTTTCGTGCATGAGCGACAGACCAGGTCCAGGGGGCTGCCTTCGCCTTCGATGTTCCTCCTGATATCTACGTATTTCACTGCTACACCCGGATTTCCACCCCCCTCTACCGCACTCTAGGCACACAGTCACAAACGCATTTCCCAGGTTAAGCCCGGGGGTTTCAAATCTGAATTATTTAACCGCCTGCGCACGCTTTACGCCCAGTAATTCCGATTAACGCTCGCACCCTCGGTATGACCGCGACTGCCAGCGGGTAGGAAGGCGGTACTTTTTATTCCGGTGCCGACATCCTCCCCGGATATTCACCGCGGCTATTTCTTTCCGTCCGACAGAGGTGTAAAACCCGAAGGCGAGCTTG
>M00920:10:000000000-A292A:1:1101:18095:13295:1
GGAGGCAGCAGTGGGGAATTTTGGACAATGGGCGGAAGCCTGATCCAGCCATGCCGCGTGAGTGAAGAAGGCCTTCGGGTTGTAAAGCTCTTTCGGTGGGGAAGAAATTGCACGGGTTAATACCCTGTGTAGATGACGGTACCCGACTAAGAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTTGGTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGAGACTGCCAAGCTGGAGTGTGGCAGAGGGGGGTGGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATCAGGAG
>M00920:10:000000000-A292A:1:2102:3086:14182:1
GTAGTGACCCAGGGGGCTGCCTTCGCCATCGGTGTTCCTCCACATCTCTACGCATTTCACTGCTACACGTGGAATTCCACCCCCCTCTGCCACACTCCAGCCTGGCAGTCTCAAATGCAGTTCCCAGGTTGAGCCCGGGGCTTTCACATCTGACTTACCAAACCGCCTGCGCACGCTTTACGCCCAGTAATTCCGATTAACGCTCGCACCCTACGTATTAACGCGGCTGCTGGCACGTAGTTCGCCGGTGCTTCTTAGTCGGGTACCGTCATCTACACAGGATATTAGCCCGTGCAATTTCTTCCCCACCGAAAGAGCTTTACAACCCGAAGGCCTTCTTCACTCACGCGGCATGGCTGGATCAGGCTTCCGCCC
>M00920:10:000000000-A292A:1:2108:13711:22806:1
GATTAAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCACGGGGGCAACCCTGGTGGCGAGTGGTGGACGGGTGAGTAAAGCATCGGAACGTATCCTGAAGTGGAGTATAACGTAGCGAAAGTTACGCTAATACCGCATAGTCTGTGAGCAGGAAAGCAGGGGATCGCAAGACCTTGCGCTCTGGGAGCGGCCGATGTCGGATTAGCTAGTTGGGGGGGTAAAGGCCTACCAAGGCGCGGCTCCGTAGCGGGGATTGGAGTATGAAACGCCACACTGTGACTGAGAAACGGCCCGGACTCCTACGTGAGGAAGCAGCGGTGAATTTTTTCCAATGGGTTCAAGCC
>M00920:10:000000000-A292A:1:2110:11377:9313:1
GCATCGGAACGTGCCCTGGAATGGGGGATAACGTAGCGAAAGTTACGCTAATACCGCATATTCTGTGAGCAGGAAAGCAGGGGATCGCAAGACCTTGCGTTCTGGGATCGGCCGATGTCGTATGAGCTAGTTGGTGGGGAAAAGGCCTACCACGGCGACGATCCGTAGCGGGTCTGAGAGGATGATCCGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCCGTGGGGAATTTTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGAGTGAAGAAGGCCTTCGGGTTGTAAAGCTCTTTCGGTGGGGAAGAAATTGCATGGGTTAATTCCC
>M00920:10:000000000-A292A:1:1105:17264:25408:1
GAATTACTGGGCGTAAAGCGTGCGCAGGCGGCGCCATAAGACAGACGTGAAATCCCCGGGCTTAACCTGGGAACTGCGTTTGTGACTGTGGTGCTCGAGTGTGGCAGAGGGGGGTGGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATGTGGAGGAACACCGATGGCGAAGGCAGCCCCCTGGGTCAACACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGCGAACTAGGTGTTGGGGAAGGAGACGTTCTTAGTACCGCAGCTAACGCGTGAAGTTCGCCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAAGGAATGGACA
>M00920:10:000000000-A292A:1:2105:19316:26848:1
ATCCGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGGGCAACCCTGATCCAGCCATTCCGCGTGAGTGAAGAAGGCCTTCGGGTTGTAAAGCTCTTTCAGCAGGAACGAAACGGCTCTCTCTAACATAGGGAGTTAATGACGGTACCTGAAGAAGAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCACAGGCGGCGCCATAAGACAGATGTGAAATCCCCGGGCTTAACCTGGGAAC
>M00920:10:000000000-A292A:1:1111:13173:15398:1
TGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACAGAACTTGCCAGAGATGGCTTGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCACCGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTTCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGCTGAAGTCAAGTCATCATGGCCCTTATGGGTAGGGCGTCACACGTCATACAATGGTCGGAACAGAGGGTTGCCAAGCCGCGAGGTGGAGCCAATCCCAGAAAACCGATCGTAGTCCGGATCGC
>M00920:10:000000000-A292A:1:1102:8010:26367:1
GCCTTACACATGCAAGTCGAACGGCAGCGGAACTTCGGGTGCCGGCGAGTGGCGAACGGGTGAGTAATGCATCGGAACGTGCCATTGAGTGGGGGATAACGTAGCGAAAGTTGCGCTAATACCGCATATTCTGTGAGCAGGAAAGCAGGGGACCGCAAGGCCTTGCGCTCTTTGAGCGGCCGATGTCAGATTAGCTAGTTGGTGAGGTAAAGGCTTACCAAGGCGACGATCTGTAGCGGGTCTGAGAGGATGATCCGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGGGCAACCCTGATCCAGCCATGCCGCGTGAGTGAAGAAGGCCTTCGGGT
>M00920:10:000000000-A292A:1:1106:8344:21464:1
GTTCCTACCATTGTAGCACGTGTGTAGCCCTGGGCATAAAGGCCATGATGACTTGACATCATCCCCTCCTTCCTCGCGTCTTACGACGGCAGTTTCTTTAGAGTTCCCAGCTTAACCTGTTGGCAACTAAAGATAGGGGTTGCGCTCGTTGCGGGACTTAACCCAACACCTCACGGCACGAGCTGACGACAGCCATGCAGCACCTGTGTGACGGCTCCCTTTCGGGCACCCTCAACTCTCATCGAGGTTCCGTCCATGTCAAGGGTAGGTAAGGTTTTTCGCGTTGCATCGAATTAATCCACATCATCCACCGCTTGTGCGGGTCCCCGTCAATTCCTTTGAGTTTTAATC
>M00920:10:000000000-A292A:1:1109:11262:3539:1
TTTACCCACCCAACACCTAGTTGACATAGTTTAGGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTACCCACGCTTTCGTGCATGAGCGTCAGTATCGGCCCAGGGGGCTGCCTTCGCCATAGGTGTTCCTCCCCATCTCTACGCTTTTCACTGCTACACGTGGAATTCCACCCCCCTCTGCCGTACTCTAGTGAGGCAGTCACAAACGCAGTTCCCAGGTTACGCCCGGGGATTTCACGCCTGTCTTACCAATCCGCCTGCGCACGCTTTACGCCCAGTAATTCCGATTAACGCTCGCACCCTACGTATTACCGCGGCTGCTGGCACGTAGTTAGCCGGTGCTTCTTATGCCGGTACCG
>M00920:10:000000000-A292A:1:1113:21063:11515:1
ACACAGGGTATTAACCCATGCGATTTCTTCCCGGCCGAAAGAGCTTTACAACCCGAAGGCCTTCTTCACTCACGCGGCATGGCTGGATCAGGGTTGCCCCCATTGTCCAAAATTCCCCACTGCTGCCTCCCGGAGGAGTCTGGCCCGTGTCTCAGTTCCAGTGTGGCGGATCATCCTCTCAGACCCGCTCCAGATCGTCGCCTTGGTAAGCCGTTACCTCACCAACTAGCTAATCTGACATAGGCCGCTCAAAGAGCGCAAGGCCTTGCGGTCCCCTGCTTTCCTGCTCACAGAATATGCGGTATTAGCGCAACTTTCGCTACGTTATCCCCCACTCAATGGCACGTTCCGATGCATTACTCACC
>M00920:10:000000000-A292A:1:2109:18065:11577:1
CCTTTGTATTGTCCATTGTAGCACGTGTGTAGCCCAAATCATAAGGGGCATGATGATTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGCAGTCAACTTAGAGTGCCCAACTTAATGATGGCAACTAAGCTTAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAGCCATGCAGCACCAGTGTGACGGCTCCCTTTCGGGCACCCTCAACTCTCATCGAGGTTCCGTCCATGTCAAGGGTAGGTAAGGTTTTTCGCGTTGCATCGAATTAATCCACATCATCCACCGCTTGTGCGGGTCCCCGTCAATTCCTTTGAGTTTTAATC
>M00920:10:000000000-A292A:1:2113:10809:18271:1
GTACGGTCGCAAGATTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGATGATGTGGATTAATTCGATGCAACGCGAAAAACCTCACCTACCCTTGACATGGACGGAACCTCGATGAGAGTTGAGGGTGCCCGAAAGGGAGCCGTCACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTAAGCTTAGTTGCCATCATTAAGTTGGGCACTCTAAGTTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGATTTGGGCTACACACGTGCTACAA
>M00920:10:000000000-A292A:1:2101:18998:6292:1
GTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACAGAACTTAGCAGAGATGCTTTGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAAGGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCCTCATGGCCCTTATGGGTAGGGCTTCACACGTCATACAATGGTCGGAACAGAGGGTTGCCAAGCCGCGAGGTGGAGCCAATCCCAGAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGAC
>M00920:10:000000000-A292A:1:2108:17778:22051:1
ATCCACAGAACTTAGCAGAGATGCTTTGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTGTCCTTAGTTACCAGCACCTCGGGTGGGCACTCTAAGGAGACTGCCGGTGACAAACCGGGGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTGCTACAATGGTCGGTACAAAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCCCATAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGTGAATC
>M00920:10:000000000-A292A:1:1104:5131:15907:1
GTACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGTCGACTAGTCGTTCGGAGCAGCAATGCACTGAGTGACGCAGCTAACGCGTGAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGATGATGTGGATTAATTCGATGCAACGCGAAAAACCTTACCTACCCTTGACATGTCTGGAGCCTTGGTGAGAGCCGAGGGTGCCTTCGGGAGCCAGAACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGT
>M00920:10:000000000-A292A:1:1113:7839:16644:1
CGTTTAGGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGTGCATGAGCGTCAGTACAGGCCCAGGGGGCTGCCTTCGCCATCGGTGTTCCTCCTGATCTCTACGCATTTCACTGCTACACCAGGAATTCCACACACTTCTGCCGTACTCTAGCCTTGCAGTCACAAACGCAGTTCCCAGGTTAAGCCCGGGGATTTCACATCTGTCTTACAAAAACGCCTCCGCACGCTTTACGCCCAGTAATTCCGATTAACGCTCGCACCCTACGTTTTACCGCGGCTGCTGGCACGTTTTTAGCCGGTGCTTCTTAGTCCGGTACCGTCATCCATGGCCTATGTTAGAGAC

ctSkennerton avatar Mar 01 '13 06:03 ctSkennerton

With your command line, fermi should not use ropebwt. Can you find string ropebwt in your makefile?

lh3 avatar Mar 01 '13 06:03 lh3

Yes I can, full makefile shown below

FERMI=fermi
UNITIG_K=200
OVERLAP_K=240

all:cdhitout_0.85.p2.mag.gz

# Construct the FM-index for raw sequences
cdhitout_0.85.raw.fmd:../cdhitout_0.85.fa
    (cat ../cdhitout_0.85.fa) | $(FERMI) ropebwt -a bcr -v3 -btNf cdhitout_0.85.raw.tmp - > $@ 2> [email protected]

# Error correction
cdhitout_0.85.ec.fq.gz:cdhitout_0.85.raw.fmd
    (cat ../cdhitout_0.85.fa) | $(FERMI) correct -t 2  $< - 2> [email protected] | gzip -1 > $@

# Construct the FM-index for corrected sequences
cdhitout_0.85.ec.fmd:cdhitout_0.85.ec.fq.gz
    $(FERMI) fltuniq $< 2> cdhitout_0.85.fltuniq.log | $(FERMI) ropebwt -a bcr -v3 -btf cdhitout_0.85.ec.tmp - > $@ 2> [email protected]

# Generate unitigs
cdhitout_0.85.p0.mag.gz:cdhitout_0.85.ec.fmd
    $(FERMI) unitig -t 2 -l $(UNITIG_K) $< 2> [email protected] | gzip -1 > $@

cdhitout_0.85.p1.mag.gz:cdhitout_0.85.p0.mag.gz
    $(FERMI) clean $< 2> [email protected] | gzip -1 > $@
cdhitout_0.85.p2.mag.gz:cdhitout_0.85.p1.mag.gz
    $(FERMI) clean -CAOFo $(OVERLAP_K) $< 2> [email protected] | gzip -1 > $@

ctSkennerton avatar Mar 01 '13 06:03 ctSkennerton

I see. I was using an old version of run-fermi.pl. More recent version use ropebwt by default. Anyway, I can see the problem now: fltuniq has filtered out all the reads, while ropebwt is expecting some input and thus hanging for some reason. For the time being, you can edit makefile and change the line containing fltuniq to cat $< | $(FERMI) ropebwt -a bcr -v3 -btf cdhitout_0.85.ec.tmp - > $@ 2> [email protected]. This skips fltuniq. I will look into the ropebwt issue later. But anyway, probably you won't get a good assembly from these reads.

lh3 avatar Mar 01 '13 06:03 lh3

For small files, actually we'd better not use fltuniq anyway. I should consider to add an option to optionally skip fltuniq altogether.

lh3 avatar Mar 01 '13 06:03 lh3

thanks, specifying -B in run-fermi.pl prevents the hang as well

ctSkennerton avatar Mar 01 '13 06:03 ctSkennerton