MeShClust
MeShClust copied to clipboard
error: list not sorted
Hi, I ran MeShClust on 700k sequences and got the following error message:
Using 16 bit histograms
Counting 4-mers [======================================================] 100 %
Splitting data
Point pairs: 38
Sorting data [=========================================================] 100 %
Warning: Alignment may be too large for sampling
Before Pair: >158256496-stool1_revised_C820061_1_gene84242 strand:+, >158256496-stool1_revised_C820061_1_gene84242 strand:+
Before Pair: >158256496-stool1_revised_C820061_1_gene84242 strand:+, >158256496-stool1_revised_C844273_1_gene26404 strand:+
Before Pair: >158256496-stool1_revised_C820061_1_gene84242 strand:+, >158256496-stool1_revised_C850045_1_gene50883 strand:-
Before Pair: >158256496-stool1_revised_C820061_1_gene84242 strand:+, >158256496-stool1_revised_C928413_1_gene23126 strand:-
Alignment [============================================================] 100 %
positive=56 negative=1008
resizing positive
Vector size: 56 min size: 56
resizing negative
Vector size: 1008 min size: 56
index size: 952
positive=56 negative=56
Adding combo 18
new single feature 2
new single feature 16
Adding combo 6
new single feature 4
Adding combo 32
new single feature 32
bounds[0]: 0 to 16290
bounds[1]: 0.0944969 to 1
bounds[2]: 0 to 16290
bounds[3]: -0.188998 to 15.5225
Accuracy: 96.4286% Sensitivity: 100% Specificity: 92.8571%
Accuracy: 94.6429% Sensitivity: 100% Specificity: 89.2857%
Adding combo 1026
new single feature 1024
bounds[0]: 0 to 16290
bounds[1]: 0.0944969 to 1
bounds[2]: 0 to 16290
bounds[3]: -0.188998 to 15.5225
bounds[4]: 34393 to 65536
Accuracy: 98.2143% Sensitivity: 100% Specificity: 96.4286%
Accuracy: 100% Sensitivity: 100% Specificity: 100%
breaking from acc cutoff
Final: feat size is 4
Using 4 features Mar 9 2018
error: list not sorted===============> ] 40 %
terminate called after throwing an instance of 'int'
Can this be overcome? Many thanks, Matthieu
I'm having the same problem with one file. Did you solve it?
New commit should fix this issue
I have a different, but possibly related, error with the same error message when trying to cluster 500k short sequences. This is on the current Master.
❯❯❯ ~/software/MeShClust/bin/meshclust experiment.fasta --id 0.6 --threads 16 --output experiment.clstr
avg length: 74
Recommended K: 3
Reading in sequences [=================================================] 100 %
Using 8 bit histograms
Counting 3-mers [======================================================] 100 %
Splitting data
Point pairs: 38
Sorting data [=========================================================] 100 %
Before Pair: >JONATHAN:1:34:1:10053:35678:13647 1:N:0:1, >JONATHAN:1:34:1:50854:28360:84667 1:N:0:1
Before Pair: >JONATHAN:1:34:1:10069:82808:87017 2:N:0:1, >JONATHAN:1:34:1:26251:60815:76440 1:N:0:1
Before Pair: >JONATHAN:1:34:1:1007:70526:36221 2:N:0:1, >JONATHAN:1:34:1:77317:28403:22037 1:N:0:1
Before Pair: >JONATHAN:1:34:1:10179:75351:98929 2:N:0:1, >JONATHAN:1:34:1:74366:29020:95042 2:N:0:1
Alignment [============================================================] 100 %
positive=785 negative=735
resizing positive
Vector size: 785 min size: 735
index size: 50
resizing negative
Vector size: 735 min size: 735
positive=735 negative=735
Adding combo 18
new single feature 2
new single feature 16
Adding combo 6
new single feature 4
Adding combo 32
new single feature 32
bounds[0]: 0 to 3
bounds[1]: 0.632353 to 1
bounds[2]: 0 to 100
bounds[3]: -0.34485 to 1
Accuracy: 100% Sensitivity: 100% Specificity: 100%
Accuracy: 99.7283% Sensitivity: 100% Specificity: 99.4565%
breaking from acc cutoff
Final: feat size is 3
Using 3 features Sep 7 2018
error: list is not sorted
error: no bins to insert into, item not inserted
[1] 17949 segmentation fault ~/software/MeShClust/bin/meshclust experiment.fasta --id 0.6 16
Hi, I got the same problem. I want to cluster 40 bp sequences (10K) with --id 0.6: bin/meshclust 100000_seqs_40_40_bp.fasta --id 0.60 --output 100000_seqs_40_40_bp.clstr --threads 6
avg length: 40 Recommended K: 2 Reading in sequences [=================================================] 100 % Using 8 bit histograms Counting 2-mers [======================================================] 100 % Splitting data Point pairs: 38 Sorting data [=========================================================] 100 % Warning: Alignment may be too large for sampling Before Pair: >A10028|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp, >A45634|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp Before Pair: >A10034|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp, >A28459|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp Before Pair: >A1003|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp, >A94460|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp Before Pair: >A10065|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp, >A94460|random sequence|A: 0.25|C: 0.25|G: 0.25|T: 0.25|length: 40 bp Alignment [============================================================] 100 % positive=45 negative=977 resizing positive Vector size: 45 min size: 45 resizing negative Vector size: 977 min size: 45 index size: 932 positive=45 negative=45 Adding combo 18 new single feature 2 new single feature 16 Adding combo 6 new single feature 4 Adding combo 32 new single feature 32 bounds[0]: 0 to 2.22507e-308 bounds[1]: 0.709091 to 1 bounds[2]: 0 to 32 bounds[3]: -0.388057 to 1 Accuracy: 86.3636% Sensitivity: 86.3636% Specificity: 86.3636% Accuracy: 91.3043% Sensitivity: 91.3043% Specificity: 91.3043% Adding combo 1026 new single feature 1024 bounds[0]: 0 to 2.22507e-308 bounds[1]: 0.709091 to 1 bounds[2]: 0 to 32 bounds[3]: -0.388057 to 1 bounds[4]: 181.527 to 256 Accuracy: 81.8182% Sensitivity: 86.3636% Specificity: 77.2727% Accuracy: 89.1304% Sensitivity: 100% Specificity: 78.2609% Final: feat size is 4 Using 4 features Sep 5 2018 error: list is not sorted error: no bins to insert into, item not inserted Segmentation fault (core dumped)
Do you have solution for this now? Thanks!
No solution yet, but am working on it Do you have data that caused the error? Thanks
Here is a sample of some data that causes this error.
https://gist.github.com/jgoodson/253f56ef4c49388304eb51fc42b9eeba
With this input, a call to MeShClust with default options does not crash and returns
Identity value does not match sampled data: Too many sequences below identity
If I specify an identity value, even the default of 0.90, I get the previous error:
error: list is not sorted
error: no bins to insert into, item not inserted
[1] 11218 segmentation fault ~/software/MeShClust/bin/meshclust exp5ks.fasta --id 0.90 --output /dev/null
Thanks
I'm sorry to bother, but have you found a solution for this problem? Thanks
Not yet
Hi @benjamin-james -- I'm sure you're quite busy, but we're also hitting this problem. Can you help us understand if this is something likely to be fixed in the next few weeks, or is it something bigger that will require a significant amount of time?
Close, fixed in a few places but not in all cases yet
master should fix this bug
Hello. I'm still having these issues with MeShClust. What should I do? Seqs are around 1k bp length and their number are around 300k.
avg length: 972
Recommended K: 4
Reading in sequences [=================================================] 100 %
Using 16 bit histograms
Counting 4-mers [======================================================] 100 %
Splitting data
Point pairs: 38
Sorting data [=========================================================] 100 %
Warning: Alignment may be too large for sampling
Before Pair: >align_id:1854781|asmbl_145 gene=PASA_cluster_114, >align_id:1942275|asmbl_87639 gene=PASA_cluster_75689
Before Pair: >align_id:1855658|asmbl_1022 gene=PASA_cluster_869, >align_id:1932409|asmbl_77773 gene=PASA_cluster_67074
Before Pair: >align_id:1855659|asmbl_1023 gene=PASA_cluster_870, >align_id:2204054|asmbl_349418 gene=PASA_cluster_288159
Before Pair: >align_id:1855697|asmbl_1061 gene=PASA_cluster_907, >align_id:1917658|asmbl_63022 gene=PASA_cluster_54328
Alignment [============================================================] 100 %
positive=45 negative=1019
resizing positive
Vector size: 45 min size: 45
resizing negative
Vector size: 1019 min size: 45
index size: 974
positive=45 negative=45
Adding combo 18
new single feature 2
new single feature 16
Adding combo 6
new single feature 4
Adding combo 32
new single feature 32
bounds[0]: 0 to 17418
bounds[1]: 0.0997519 to 1
bounds[2]: 0 to 17418
bounds[3]: 0.272044 to 1.52156
Inverse does not exist
Accuracy: 0% Sensitivity: 0% Specificity: 0%
Accuracy: 0% Sensitivity: 0% Specificity: 0%
Adding combo 1026
new single feature 1024
bounds[0]: 0 to 17418
bounds[1]: 0.0997519 to 1
bounds[2]: 0 to 17418
bounds[3]: 0.272044 to 1.52156
bounds[4]: 34488.1 to 65536
Inverse does not exist
Accuracy: 0% Sensitivity: 0% Specificity: 0%
Accuracy: 0% Sensitivity: 0% Specificity: 0%
Final: feat size is 4
Using 4 features Feb 2 2018
error: list not sorted ] 2 %
terminate called after throwing an instance of 'int'
Aborted (core dumped)```
Hi. I am happy to help. In order to reproduce this error on my machine, would you share the input sequences that caused this error? My email address is hzgirgis at buffalo dot edu