SALSA
SALSA copied to clipboard
the halting problem, NG50 compared with a few other metrics
Thanks for the wonderful tool. Recently I ran Salsa2 with arg -i 10
and it terminated before that number of iterations. Though there were good results, there were also several super-scaffoldings that had been made in a 3D-dna run that weren't present.
So, inspecting the code I saw the NG50 advancement test to determine when to break out of the loop. To see what more iterations might accomplish, I commented that out and reran with a setting of -i 30
to create the 31 scaffolds_ITERATION_# agp files.
I'm attaching the text of stats_scaffolds_ITERATION.txt (I changed the suffix to txt for uploading) a short awk script that printed out the following info below for each of the agps. The columns are filename, total of scaffold lengths, number of scaffolds and various N#/L# values The asterisk before a value means it's the same as it was in the prior agp file.
You can see the N50 test has iter_7 as the first repeat, but using number of scaffolds it's iter_10, which has all the others values repeated from iter_9 as well.
If we allow for an additional iteration look-ahead then iter_11 gives us something new. We get a repeat of iter_12 at iter_13, but again a 1 iter look ahead gets us something new at iter_14 and then we have the first double repeats (i.e., 3 of the same set of values in a row) starting at iter_16, with 8 sets of the same values in a row. So that improves from iter_6 140 scaffs N50 60,979,473 L50 16, to iter_16 126 scaffs N50 79,034,342 L50 14.
Anyway, food for thought. Thanks again for a great tool.
--Jim Henderson, California Academy of Sciences
output:
scaffolds_ITERATION_1.agp 2644769511 183 N50:29638679 L50:30 N60:24241170 L60:40 N70:20428790 L70:52 N80:14244363 L80:66 N90:7701609 L90:92
scaffolds_ITERATION_2.agp 2644777511 167 N50:36411706 L50:24 N60:28328424 L60:33 N70:23927955 L70:43 N80:18131835 L80:56 N90:8502176 L90:78
scaffolds_ITERATION_3.agp 2644783511 155 N50:40639694 L50:20 N60:31698195 L60:27 N70:25231206 L70:36 N80:19159797 L80:49 N90:9145689 L90:67
scaffolds_ITERATION_4.agp 2644786011 150 N50:50315364 L50:18 N60:36411706 L60:24 N70:27576223 L70:33 N80:19959479 L80:44 *N90:9145689 L90:63
scaffolds_ITERATION_5.agp 2644788011 146 N50:50898031 L50:17 N60:40166381 L60:23 N70:29638679 L70:31 N80:20428790 L80:42 N90:9663506 L90:59
scaffolds_ITERATION_6.agp 2644790011 142 N50:60979473 L50:16 N60:40538746 L60:22 N70:31406863 L70:29 N80:20615824 L80:39 N90:9949571 L90:56
scaffolds_ITERATION_7.agp 2644791011 140 *N50:60979473 L50:16 N60:40639694 L60:21 *N70:31406863 L70:29 *N80:20615824 L80:39 N90:9989659 L90:55
scaffolds_ITERATION_8.agp 2644793511 135 N50:67572291 L50:15 N60:44061934 L60:20 N70:31746683 L70:27 N80:21748937 L80:37 N90:13181980 L90:51
scaffolds_ITERATION_9.agp 2644794511 133 *N50:67572291 L50:15 N60:50315364 L60:19 N70:33095761 L70:26 N80:24234933 L80:35 *N90:13181980 L90:50
scaffolds_ITERATION_10.agp *2644794511 *133 *N50:67572291 L50:15 *N60:50315364 L60:19 *N70:33095761 L70:26 *N80:24234933 L80:35 *N90:13181980 L90:50
scaffolds_ITERATION_11.agp 2644795011 132 *N50:67572291 L50:15 N60:58838040 L60:19 N70:36411706 L70:25 *N80:24234933 L80:34 *N90:13181980 L90:49
scaffolds_ITERATION_12.agp 2644795511 131 N50:67716364 L50:15 *N60:58838040 L60:19 N70:40166381 L70:24 *N80:24234933 L80:33 *N90:13181980 L90:48
scaffolds_ITERATION_13.agp *2644795511 *131 *N50:67716364 L50:15 *N60:58838040 L60:19 *N70:40166381 L70:24 *N80:24234933 L80:33 *N90:13181980 L90:48
scaffolds_ITERATION_14.agp 2644796011 130 *N50:67716364 L50:15 *N60:58838040 L60:19 *N70:40166381 L70:24 *N80:24234933 L80:33 N90:13773325 L90:47
scaffolds_ITERATION_15.agp 2644797011 128 N50:68788111 L50:15 N60:67206133 L60:18 *N70:40166381 L70:24 *N80:24234933 L80:33 N90:13922777 L90:46
scaffolds_ITERATION_16.agp 2644798011 126 N50:79034342 L50:14 *N60:67206133 L60:18 N70:40538746 L70:23 N80:24261773 L80:31 *N90:13922777 L90:45
scaffolds_ITERATION_17.agp *2644798011 *126 *N50:79034342 L50:14 *N60:67206133 L60:18 *N70:40538746 L70:23 *N80:24261773 L80:31 *N90:13922777 L90:45
scaffolds_ITERATION_18.agp *2644798011 *126 *N50:79034342 L50:14 *N60:67206133 L60:18 *N70:40538746 L70:23 *N80:24261773 L80:31 *N90:13922777 L90:45
scaffolds_ITERATION_19.agp *2644798011 *126 *N50:79034342 L50:14 *N60:67206133 L60:18 *N70:40538746 L70:23 *N80:24261773 L80:31 *N90:13922777 L90:45
scaffolds_ITERATION_20.agp *2644798011 *126 *N50:79034342 L50:14 *N60:67206133 L60:18 *N70:40538746 L70:23 *N80:24261773 L80:31 *N90:13922777 L90:45
scaffolds_ITERATION_21.agp *2644798011 *126 *N50:79034342 L50:14 *N60:67206133 L60:18 *N70:40538746 L70:23 *N80:24261773 L80:31 *N90:13922777 L90:45
scaffolds_ITERATION_22.agp *2644798011 *126 *N50:79034342 L50:14 *N60:67206133 L60:18 *N70:40538746 L70:23 *N80:24261773 L80:31 *N90:13922777 L90:45
scaffolds_ITERATION_23.agp *2644798011 *126 *N50:79034342 L50:14 *N60:67206133 L60:18 *N70:40538746 L70:23 *N80:24261773 L80:31 *N90:13922777 L90:45
scaffolds_ITERATION_24.agp 2644798511 125 *N50:79034342 L50:13 *N60:67206133 L60:17 *N70:40538746 L70:22 *N80:24261773 L80:30 *N90:13922777 L90:44
scaffolds_ITERATION_25.agp *2644798511 *125 *N50:79034342 L50:13 *N60:67206133 L60:17 *N70:40538746 L70:22 *N80:24261773 L80:30 *N90:13922777 L90:44
scaffolds_ITERATION_26.agp *2644798511 *125 *N50:79034342 L50:13 *N60:67206133 L60:17 *N70:40538746 L70:22 *N80:24261773 L80:30 *N90:13922777 L90:44
scaffolds_ITERATION_27.agp *2644798511 *125 *N50:79034342 L50:13 *N60:67206133 L60:17 *N70:40538746 L70:22 *N80:24261773 L80:30 *N90:13922777 L90:44
scaffolds_ITERATION_28.agp *2644798511 *125 *N50:79034342 L50:13 *N60:67206133 L60:17 *N70:40538746 L70:22 *N80:24261773 L80:30 *N90:13922777 L90:44
scaffolds_ITERATION_29.agp 2644799011 124 *N50:79034342 L50:13 *N60:67206133 L60:17 *N70:40538746 L70:22 N80:26372909 L80:30 N90:14244363 L90:43
scaffolds_ITERATION_30.agp *2644799011 *124 *N50:79034342 L50:13 *N60:67206133 L60:17 *N70:40538746 L70:22 *N80:26372909 L80:30 *N90:14244363 L90:43
scaffolds_ITERATION_31.agp *2644799011 *124 *N50:79034342 L50:13 *N60:67206133 L60:17 *N70:40538746 L70:22 *N80:26372909 L80:30 *N90:14244363 L90:43