reciprocalspaceship
reciprocalspaceship copied to clipboard
Resolving indexing ambiguities when comparing datasets
One of the goals of reciprocalspaceship
is to make it easier to compare datasets in reciprocal space. For some spacegroups, indexing ambiguities can confound such comparisons. We should add some functions to rs
to make it easier to deal with these spacegroups.
In particular, we can use gemmi.find_twin_laws()
to identify possible reindexing operations, and we can use correlation coefficients to determine what the best operation is. The gemmi function seems to produce output consistent with the CCP4 table (poor formatting), but I've only tested a handful of cases:
# PYP
In [1]: gemmi.find_twin_laws(gemmi.UnitCell(66.9, 66.9, 40.8, 90, 90, 120), gemmi.SpaceGroup("P 63"), max_obliq=1.0, all_ops=False)
Out[1]: [<gemmi.Op("-x,-x+y,-z")>]
# PTP1B
In [2]: gemmi.find_twin_laws(gemmi.UnitCell(89.3, 89.3, 105.7, 90, 90, 120), gemmi.SpaceGroup("P 31 2 1"), max_obliq=1.0, all_ops=False)
Out[2]: [<gemmi.Op("-x,-y,z")>]
I think this should involve two additions to rs
:
- Helper function for
rs.DataSet
that returns possible reindexing operations (perhapsrs.utils.possible_reindexing_ops(dataset)
). This function could then be used elsewhere, wherever such a functionality would be desired. -
rs.algorithms.resolve_indexing_ambiguity(dataset1, dataset2, column_key="IMEAN")
, which applies the reindexing operation to dataset2 that gives the best correlation in IMEAN.
Please let me know if anyone has any thoughts on this -- together with #31, this should handle some of the gotchas when making comparisons across isomorphous structures/datasets
I think this is a great feature to add. I'm not crazy about the function names. I think we could workshop them a little. Here are alternative signatures off the top of my head. Not saying these are perfect either.
-
rs.DataSet.reindexing_ops()
<-- I see this as more of a bound method thing. theutils
method should just take acell
andspacegroup
to be consistent with the rest of theutils
api. However, the gemmi method already covers this. In my mind, there is no need to wrap it at theutils
level. Thoughts? -
rs.algorithms.indexing.reindex_like(reindex_dataset, template_dataset, column_key='auto')
<-- just spitballing here, but I don't hate the idea of analgorithms.indexing
namespace.
I like the plan for 1 -- I think DataSet.reindexing_ops()
is fine. It doesn't necessarily need a supporting utils
method because it's already a one-liner to gemmi
.
For two, I don't love reindex_like()
-- maybe reindex_to_ref()
? More generally, I'm also fine with refactoring to have submodules in rs.algorithms
.
- Great
- I'll throw some more out there
-
reindex_like_ref(dataset, reference, ...)
-
reindex_as_ref(dataset, reference, ...)
-
reindex(dataset, reference, ...)
-
index_same_as(dataset, reference)
reindex_like_ref(dataset, reference, ...)
is my preference
I'm glad that you think about using gemmi.find_twin_laws()
. This is why this function was added. Like you, I wanted to compare mtz files regardless of reindexing. I asked around how to get all possible reindexing operations and I was advised that the reindexing operations are the same as merohedral twinning laws. Learning how to determine the twinning laws took me a while, and then I had to postpone reindexing and do other things.
My understanding is that if two crystals have the same space group and unit cell parameters, the possible twinning operations and reindexing operations are simply the same. But if the cell parameters differ, then I'm not sure what to do. There are several papers, most of them by Andrews & Bernstein, on how to find alternative unit cells and how to measure distance between unit cells (example). My thinking was that perhaps combining twinning laws with the search of alternative unit cells would give me all reindexing operations, but actually I don't know if it makes sense.
Thanks for adding some more info, @wojdyr. In this case, I will include a call to DataSet.is_isomorphous()
in this function to enforce that dataset
and reference
have similar unit cell parameters and the same space group setting. That will at least restrict the use of this method to supported cases.
Regarding DataSet.reindexing_ops
, I could imagine implementing this in two ways: 1) bound method or 2) attribute. The rationale for making it an attribute is that it doesn't depend on anything other than spacegroup+cell. In practice, the main difference is whether it gets invoked with ()
or not.
I'm inclined to make this an attribute, using something like:
@property
def reindexing_ops(self):
"""Possible reindexing operations (twin laws) for DataSet"""
return gemmi.find_twin_laws(self.cell, self.spacegroup, max_obliq=1.0, all_ops=False)
Please comment if anyone thinks a method will be better suited to this task.
I don't have strong feelings about this. If you choose to make it a method, I'd prefer a name like get_reindexing_ops
so that it is clearly a method and not an attribute.
Sounds good -- I agree that a method name should get an action verb.
My intent for this function/attribute is to support only merohedral twinning laws (source of indexing ambiguity). If we care about pseudo-merohedral twinning here, we would want to be able to accept a max_obliq
$\delta$-angle that we can pass along to the gemmi function (see gemmi docs).
If we only support the merohedral case (I think reindexing_ops
can really only be considered the merohedral case, whereas "twin laws" could refer to either), then I think the attribute implementation is most appropriate.
i wouldn't be opposed to having both
@property
def reindexing_ops(self):
and
def find_twin_laws(self, max_obliq=1.0):
fair enough -- at the end of the day these are just convenience methods that call to gemmi
and simplify the API a bit
my thoughts exactly. i don't think it is a problem if we have slightly redundant ways to do the same thing.