reciprocalspaceship Resolving indexing ambiguities when comparing datasets

One of the goals of reciprocalspaceship is to make it easier to compare datasets in reciprocal space. For some spacegroups, indexing ambiguities can confound such comparisons. We should add some functions to rs to make it easier to deal with these spacegroups.

In particular, we can use gemmi.find_twin_laws() to identify possible reindexing operations, and we can use correlation coefficients to determine what the best operation is. The gemmi function seems to produce output consistent with the CCP4 table (poor formatting), but I've only tested a handful of cases:

# PYP
In [1]: gemmi.find_twin_laws(gemmi.UnitCell(66.9, 66.9, 40.8, 90, 90, 120), gemmi.SpaceGroup("P 63"), max_obliq=1.0, all_ops=False)
Out[1]: [<gemmi.Op("-x,-x+y,-z")>]

# PTP1B
In [2]: gemmi.find_twin_laws(gemmi.UnitCell(89.3, 89.3, 105.7, 90, 90, 120), gemmi.SpaceGroup("P 31 2 1"), max_obliq=1.0, all_ops=False)
Out[2]: [<gemmi.Op("-x,-y,z")>]

I think this should involve two additions to rs:

Helper function for rs.DataSet that returns possible reindexing operations (perhaps rs.utils.possible_reindexing_ops(dataset)). This function could then be used elsewhere, wherever such a functionality would be desired.
rs.algorithms.resolve_indexing_ambiguity(dataset1, dataset2, column_key="IMEAN"), which applies the reindexing operation to dataset2 that gives the best correlation in IMEAN.

Please let me know if anyone has any thoughts on this -- together with #31, this should handle some of the gotchas when making comparisons across isomorphous structures/datasets

Aug 05 '22 18:08 JBGreisman

I think this is a great feature to add. I'm not crazy about the function names. I think we could workshop them a little. Here are alternative signatures off the top of my head. Not saying these are perfect either.

rs.DataSet.reindexing_ops() <-- I see this as more of a bound method thing. the utils method should just take a cell and spacegroup to be consistent with the rest of the utils api. However, the gemmi method already covers this. In my mind, there is no need to wrap it at the utils level. Thoughts?
rs.algorithms.indexing.reindex_like(reindex_dataset, template_dataset, column_key='auto') <-- just spitballing here, but I don't hate the idea of an algorithms.indexing namespace.

Aug 05 '22 19:08 kmdalton

I like the plan for 1 -- I think DataSet.reindexing_ops() is fine. It doesn't necessarily need a supporting utils method because it's already a one-liner to gemmi.

For two, I don't love reindex_like() -- maybe reindex_to_ref()? More generally, I'm also fine with refactoring to have submodules in rs.algorithms.

Aug 05 '22 19:08 JBGreisman

Great
I'll throw some more out there

reindex_like_ref(dataset, reference, ...)
reindex_as_ref(dataset, reference, ...)
reindex(dataset, reference, ...)
index_same_as(dataset, reference)

Aug 05 '22 19:08 kmdalton

reindex_like_ref(dataset, reference, ...) is my preference

Aug 05 '22 20:08 JBGreisman

I'm glad that you think about using gemmi.find_twin_laws(). This is why this function was added. Like you, I wanted to compare mtz files regardless of reindexing. I asked around how to get all possible reindexing operations and I was advised that the reindexing operations are the same as merohedral twinning laws. Learning how to determine the twinning laws took me a while, and then I had to postpone reindexing and do other things.

My understanding is that if two crystals have the same space group and unit cell parameters, the possible twinning operations and reindexing operations are simply the same. But if the cell parameters differ, then I'm not sure what to do. There are several papers, most of them by Andrews & Bernstein, on how to find alternative unit cells and how to measure distance between unit cells (example). My thinking was that perhaps combining twinning laws with the search of alternative unit cells would give me all reindexing operations, but actually I don't know if it makes sense.

Aug 08 '22 23:08 wojdyr

Thanks for adding some more info, @wojdyr. In this case, I will include a call to DataSet.is_isomorphous() in this function to enforce that dataset and reference have similar unit cell parameters and the same space group setting. That will at least restrict the use of this method to supported cases.

Aug 09 '22 01:08 JBGreisman

Regarding DataSet.reindexing_ops, I could imagine implementing this in two ways: 1) bound method or 2) attribute. The rationale for making it an attribute is that it doesn't depend on anything other than spacegroup+cell. In practice, the main difference is whether it gets invoked with () or not.

I'm inclined to make this an attribute, using something like:

@property
def reindexing_ops(self):
    """Possible reindexing operations (twin laws) for DataSet"""
    return gemmi.find_twin_laws(self.cell, self.spacegroup, max_obliq=1.0, all_ops=False)

Please comment if anyone thinks a method will be better suited to this task.

Sep 07 '22 13:09 JBGreisman

I don't have strong feelings about this. If you choose to make it a method, I'd prefer a name like get_reindexing_ops so that it is clearly a method and not an attribute.

Sep 07 '22 14:09 kmdalton

Sounds good -- I agree that a method name should get an action verb.

My intent for this function/attribute is to support only merohedral twinning laws (source of indexing ambiguity). If we care about pseudo-merohedral twinning here, we would want to be able to accept a max_obliq $\delta$-angle that we can pass along to the gemmi function (see gemmi docs).

If we only support the merohedral case (I think reindexing_ops can really only be considered the merohedral case, whereas "twin laws" could refer to either), then I think the attribute implementation is most appropriate.

Sep 07 '22 15:09 JBGreisman

i wouldn't be opposed to having both

@property
def reindexing_ops(self):

and

def find_twin_laws(self, max_obliq=1.0):

Sep 07 '22 16:09 kmdalton

fair enough -- at the end of the day these are just convenience methods that call to gemmi and simplify the API a bit

Sep 07 '22 16:09 JBGreisman

my thoughts exactly. i don't think it is a problem if we have slightly redundant ways to do the same thing.

Sep 07 '22 16:09 kmdalton

reciprocalspaceship reciprocalspaceship copied to clipboard

Resolving indexing ambiguities when comparing datasets

reciprocalspaceship
reciprocalspaceship copied to clipboard