SALSA icon indicating copy to clipboard operation
SALSA copied to clipboard

the halting problem, NG50 compared with a few other metrics

Open jbh-cas opened this issue 4 years ago • 0 comments

Thanks for the wonderful tool. Recently I ran Salsa2 with arg -i 10 and it terminated before that number of iterations. Though there were good results, there were also several super-scaffoldings that had been made in a 3D-dna run that weren't present.

So, inspecting the code I saw the NG50 advancement test to determine when to break out of the loop. To see what more iterations might accomplish, I commented that out and reran with a setting of -i 30 to create the 31 scaffolds_ITERATION_# agp files.

I'm attaching the text of stats_scaffolds_ITERATION.txt (I changed the suffix to txt for uploading) a short awk script that printed out the following info below for each of the agps. The columns are filename, total of scaffold lengths, number of scaffolds and various N#/L# values The asterisk before a value means it's the same as it was in the prior agp file.

You can see the N50 test has iter_7 as the first repeat, but using number of scaffolds it's iter_10, which has all the others values repeated from iter_9 as well.

If we allow for an additional iteration look-ahead then iter_11 gives us something new. We get a repeat of iter_12 at iter_13, but again a 1 iter look ahead gets us something new at iter_14 and then we have the first double repeats (i.e., 3 of the same set of values in a row) starting at iter_16, with 8 sets of the same values in a row. So that improves from iter_6 140 scaffs N50 60,979,473 L50 16, to iter_16 126 scaffs N50 79,034,342 L50 14.

Anyway, food for thought. Thanks again for a great tool.

--Jim Henderson, California Academy of Sciences

output:

scaffolds_ITERATION_1.agp	 2644769511	 183	 N50:29638679 L50:30	 N60:24241170 L60:40	 N70:20428790 L70:52	 N80:14244363 L80:66	 N90:7701609 L90:92
scaffolds_ITERATION_2.agp	 2644777511	 167	 N50:36411706 L50:24	 N60:28328424 L60:33	 N70:23927955 L70:43	 N80:18131835 L80:56	 N90:8502176 L90:78
scaffolds_ITERATION_3.agp	 2644783511	 155	 N50:40639694 L50:20	 N60:31698195 L60:27	 N70:25231206 L70:36	 N80:19159797 L80:49	 N90:9145689 L90:67
scaffolds_ITERATION_4.agp	 2644786011	 150	 N50:50315364 L50:18	 N60:36411706 L60:24	 N70:27576223 L70:33	 N80:19959479 L80:44	*N90:9145689 L90:63
scaffolds_ITERATION_5.agp	 2644788011	 146	 N50:50898031 L50:17	 N60:40166381 L60:23	 N70:29638679 L70:31	 N80:20428790 L80:42	 N90:9663506 L90:59
scaffolds_ITERATION_6.agp	 2644790011	 142	 N50:60979473 L50:16	 N60:40538746 L60:22	 N70:31406863 L70:29	 N80:20615824 L80:39	 N90:9949571 L90:56
scaffolds_ITERATION_7.agp	 2644791011	 140	*N50:60979473 L50:16	 N60:40639694 L60:21	*N70:31406863 L70:29	*N80:20615824 L80:39	 N90:9989659 L90:55
scaffolds_ITERATION_8.agp	 2644793511	 135	 N50:67572291 L50:15	 N60:44061934 L60:20	 N70:31746683 L70:27	 N80:21748937 L80:37	 N90:13181980 L90:51
scaffolds_ITERATION_9.agp	 2644794511	 133	*N50:67572291 L50:15	 N60:50315364 L60:19	 N70:33095761 L70:26	 N80:24234933 L80:35	*N90:13181980 L90:50
scaffolds_ITERATION_10.agp	*2644794511	*133	*N50:67572291 L50:15	*N60:50315364 L60:19	*N70:33095761 L70:26	*N80:24234933 L80:35	*N90:13181980 L90:50
scaffolds_ITERATION_11.agp	 2644795011	 132	*N50:67572291 L50:15	 N60:58838040 L60:19	 N70:36411706 L70:25	*N80:24234933 L80:34	*N90:13181980 L90:49
scaffolds_ITERATION_12.agp	 2644795511	 131	 N50:67716364 L50:15	*N60:58838040 L60:19	 N70:40166381 L70:24	*N80:24234933 L80:33	*N90:13181980 L90:48
scaffolds_ITERATION_13.agp	*2644795511	*131	*N50:67716364 L50:15	*N60:58838040 L60:19	*N70:40166381 L70:24	*N80:24234933 L80:33	*N90:13181980 L90:48
scaffolds_ITERATION_14.agp	 2644796011	 130	*N50:67716364 L50:15	*N60:58838040 L60:19	*N70:40166381 L70:24	*N80:24234933 L80:33	 N90:13773325 L90:47
scaffolds_ITERATION_15.agp	 2644797011	 128	 N50:68788111 L50:15	 N60:67206133 L60:18	*N70:40166381 L70:24	*N80:24234933 L80:33	 N90:13922777 L90:46
scaffolds_ITERATION_16.agp	 2644798011	 126	 N50:79034342 L50:14	*N60:67206133 L60:18	 N70:40538746 L70:23	 N80:24261773 L80:31	*N90:13922777 L90:45
scaffolds_ITERATION_17.agp	*2644798011	*126	*N50:79034342 L50:14	*N60:67206133 L60:18	*N70:40538746 L70:23	*N80:24261773 L80:31	*N90:13922777 L90:45
scaffolds_ITERATION_18.agp	*2644798011	*126	*N50:79034342 L50:14	*N60:67206133 L60:18	*N70:40538746 L70:23	*N80:24261773 L80:31	*N90:13922777 L90:45
scaffolds_ITERATION_19.agp	*2644798011	*126	*N50:79034342 L50:14	*N60:67206133 L60:18	*N70:40538746 L70:23	*N80:24261773 L80:31	*N90:13922777 L90:45
scaffolds_ITERATION_20.agp	*2644798011	*126	*N50:79034342 L50:14	*N60:67206133 L60:18	*N70:40538746 L70:23	*N80:24261773 L80:31	*N90:13922777 L90:45
scaffolds_ITERATION_21.agp	*2644798011	*126	*N50:79034342 L50:14	*N60:67206133 L60:18	*N70:40538746 L70:23	*N80:24261773 L80:31	*N90:13922777 L90:45
scaffolds_ITERATION_22.agp	*2644798011	*126	*N50:79034342 L50:14	*N60:67206133 L60:18	*N70:40538746 L70:23	*N80:24261773 L80:31	*N90:13922777 L90:45
scaffolds_ITERATION_23.agp	*2644798011	*126	*N50:79034342 L50:14	*N60:67206133 L60:18	*N70:40538746 L70:23	*N80:24261773 L80:31	*N90:13922777 L90:45
scaffolds_ITERATION_24.agp	 2644798511	 125	*N50:79034342 L50:13	*N60:67206133 L60:17	*N70:40538746 L70:22	*N80:24261773 L80:30	*N90:13922777 L90:44
scaffolds_ITERATION_25.agp	*2644798511	*125	*N50:79034342 L50:13	*N60:67206133 L60:17	*N70:40538746 L70:22	*N80:24261773 L80:30	*N90:13922777 L90:44
scaffolds_ITERATION_26.agp	*2644798511	*125	*N50:79034342 L50:13	*N60:67206133 L60:17	*N70:40538746 L70:22	*N80:24261773 L80:30	*N90:13922777 L90:44
scaffolds_ITERATION_27.agp	*2644798511	*125	*N50:79034342 L50:13	*N60:67206133 L60:17	*N70:40538746 L70:22	*N80:24261773 L80:30	*N90:13922777 L90:44
scaffolds_ITERATION_28.agp	*2644798511	*125	*N50:79034342 L50:13	*N60:67206133 L60:17	*N70:40538746 L70:22	*N80:24261773 L80:30	*N90:13922777 L90:44
scaffolds_ITERATION_29.agp	 2644799011	 124	*N50:79034342 L50:13	*N60:67206133 L60:17	*N70:40538746 L70:22	 N80:26372909 L80:30	 N90:14244363 L90:43
scaffolds_ITERATION_30.agp	*2644799011	*124	*N50:79034342 L50:13	*N60:67206133 L60:17	*N70:40538746 L70:22	*N80:26372909 L80:30	*N90:14244363 L90:43
scaffolds_ITERATION_31.agp	*2644799011	*124	*N50:79034342 L50:13	*N60:67206133 L60:17	*N70:40538746 L70:22	*N80:26372909 L80:30	*N90:14244363 L90:43

jbh-cas avatar May 10 '20 00:05 jbh-cas