Symmetry detection fails for long twisted assemblies
For examples look at 6CL5, 6GAO, 5JXC.
The problem, I believe, is in the following:
- Subunits are represented via their centers: https://github.com/biojava/biojava/blob/2bb08570f8a79e27e958b5c136207e2192868aa7/biojava-structure/src/main/java/org/biojava/nbio/structure/symmetry/core/RotationSolver.java#L459
- Permutation of the centers is used to derive transformations: https://github.com/biojava/biojava/blob/2bb08570f8a79e27e958b5c136207e2192868aa7/biojava-structure/src/main/java/org/biojava/nbio/structure/symmetry/core/RotationSolver.java#L232-L233... https://github.com/biojava/biojava/blob/2bb08570f8a79e27e958b5c136207e2192868aa7/biojava-structure/src/main/java/org/biojava/nbio/structure/symmetry/core/RotationSolver.java#L237
- The transformations are applied to the whole subunits https://github.com/biojava/biojava/blob/2bb08570f8a79e27e958b5c136207e2192868aa7/biojava-structure/src/main/java/org/biojava/nbio/structure/symmetry/core/RotationSolver.java#L260
- The centers of long twisted chains are not representative well enough. E.g., in 6CL5 the centers are within ~1A of each other. Since the chains are not perfectly symmetrical, the transformations derived from the centers are essentially random.
There may also a potential problem in generating permutations for such structures. The "closest neighbor" center would not necessarily represent the chain that's being superposed, which may affect validity of the permutations. Although at the moment I do not fully understand how this part of the code works, it's around here: https://github.com/biojava/biojava/blob/2bb08570f8a79e27e958b5c136207e2192868aa7/biojava-structure/src/main/java/org/biojava/nbio/structure/symmetry/core/RotationSolver.java#L401
If we replace the centers with the full list coordinates, it will fix the problem, but obviously affect the performance by quite a bit. So I suggest doing something in-between, i.e., splitting the chains into several segments of consecutive residues and calculating centers for those segments. The number of segments can be a parameter in the QuatSymmetryParameters class.
Any thoughts, suggestions? Thank you!
Interesting case! How about checking if the centers of mass of the subunits are too close (based on a threshold) and if they are below a threshold, switch to the all-atom comparison?
On Fri, Aug 24, 2018 at 6:02 PM Dmytro Guzenko [email protected] wrote:
For examples look at 6CL5, 6GAO, 5JXC.
The problem, I believe, is in the following:
- Subunits are represented via their centers:
https://github.com/biojava/biojava/blob/2bb08570f8a79e27e958b5c136207e2192868aa7/biojava-structure/src/main/java/org/biojava/nbio/structure/symmetry/core/RotationSolver.java#L459 2. Permutation of the centers is used to derive transformations:
https://github.com/biojava/biojava/blob/2bb08570f8a79e27e958b5c136207e2192868aa7/biojava-structure/src/main/java/org/biojava/nbio/structure/symmetry/core/RotationSolver.java#L232-L233 ...
https://github.com/biojava/biojava/blob/2bb08570f8a79e27e958b5c136207e2192868aa7/biojava-structure/src/main/java/org/biojava/nbio/structure/symmetry/core/RotationSolver.java#L237 3. The transformations are applied to the whole subunits
https://github.com/biojava/biojava/blob/2bb08570f8a79e27e958b5c136207e2192868aa7/biojava-structure/src/main/java/org/biojava/nbio/structure/symmetry/core/RotationSolver.java#L260 4. The centers of long twisted chains are not representative well enough. E.g., in 6CL5 the centers are within ~1A of each other. Because the chains are not perfectly symmetrical, the transformations derived from the centers are essentially random.
There may also a potential problem in generating permutations for such structures. The "closest neighbor" center would not necessarily represent the chain that's being superposed, which may affect validity of the permutations. Although at the moment I do not fully understand how this part of the code works, it's around here:
https://github.com/biojava/biojava/blob/2bb08570f8a79e27e958b5c136207e2192868aa7/biojava-structure/src/main/java/org/biojava/nbio/structure/symmetry/core/RotationSolver.java#L401
If we replace the centers with the full list coordinates, it will fix the problem, but obviously affect the performance by quite a bit. So I suggest doing something in-between, i.e., splitting the chains into several segments of consecutive residues and calculating centers for those segments. The number of segments can be a parameter in the QuatSymmetryParameters class.
Any thoughts, suggestions? Thank you!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/biojava/biojava/issues/795, or mute the thread https://github.com/notifications/unsubscribe-auth/ADuwEOxkA539d3LO0l-cuL4cb8c2DPE-ks5uUKIPgaJpZM4WMLMg .
-- Peter Rose, Ph.D. Director, Structural Bioinformatics Laboratory San Diego Supercomputer Center UC San Diego +1-858-822-5497
These assemblies have very elongated subunits and large interfaces, so this might be another property to identify them and switch to the all-atom comparison.
All-atom is probably overkill if we're worried about performance. We could get by with some small set of atoms that preserves the orientation. The principal axes might work, or even a small random sample (say, 10 atoms).
The centroid method fails to account for orientation, so it would also fail for globular asymmetric complexes if their centroids happened to fall on a circle. We've been fortunate so far in that the vast majority of homooligomers really do have symmetry so the centroid heuristic works well.
This might improve helix visualization too (#306)