poly
poly copied to clipboard
Go implementation of Nonrepetitive Parts Calculator
As discussed Ayaan Hossein's 2020 first author paper (a collaboration between Salis and Klavins labs), repeated identical DNA sequences are a significant source of instability in synthetic genetic parts and devices. Cells can use two sites with the same DNA sequence as a substrate for homologous recombination, deleting or duplicating everything between the repeats. The rate at which this type of recombination on repeats can break synthetic genetic devices is much higher than a cell's baseline rate of genomic mutation, and becomes a major problem for synthetic genetic circuits that put a significant metabolic load on the cell, like multicopy recombinant protein production operons or multilayer genetic logic gates.
Hossein led the development of an incredibly useful software package to address this problem: the Non-Repetitive Parts Calculator, or NRPcalc. Basically NRPcalc takes as input the length of a set of sequences you want, the number of sequence variants in that set that you'd like, the longest shared identical sequence you want to allow (Lmax), a 'background' set of sequences you'd like the new sequences to be 'repeat-orthogonal' to, like the genome sequence of a particular organism, and any additional sequence constraints you want to add (forbidden restriction sites etc). NRPcalc can then design for you a set of sequences that match your specifications and share no more than Lmax bases of identical homology with each other or the background sequences. NRPcalc also has a function that takes an existing set of parts and tries to find the largest nonrepetitive subset of those parts that shares no identical sequences longer than Lmax. As shown in the Hossein et al. paper above, these nonrepetitive genetic parts are much more genetically stable than repeated genetic elements with identical sequences. Salis lab is also working on a version of NRPcalc specialized for protein-coding gene sequences, which Howard Salis kindly allowed me to try out through a web-based submission page. I think these packages rule and their functionality will be very important for bioengineering going forward.
I would love to see and use a Go implementation of NRPcalc (for both CDS and non-CDS sequences) in Poly. It would be especially cool to implement NRPcalc with the option to easily/automatically take into consideration common gene synthesis constraints, like homopolymer length, restriction sites, hairpins and codon frequency. I would use this functionality a ton.