pymatgen icon indicating copy to clipboard operation
pymatgen copied to clipboard

[WIP] New Structure.get_element_distance() method

Open rpw199912j opened this issue 4 years ago • 4 comments

Summary

Added a new get_element_distance() method to the pymatgen.core.structure.Structure class that:

  • Calculates the minimum element-element distance for one element or between two different elements.
  • Automatically calculates with a supercell if any element in the input structure occupies only one site or needs additional sites to better reflect global symmetry.

Algorithm Overview

Using Fe2O3 as an example (assuming Fe occupies site 0 and 1; O occupies site 2, 3, 4).

  1. For each element in the structure, figure out the site(s) it occupies. For example, Fe2O3 would produce {Fe: [0, 1], O: [2, 3, 4]}. When an element in the input structure occupies only one site, by default, make a supercell of [2a, 2b, 2c] to avoid getting a distance of 0.
  2. Check if the specified element(s) can be found in the structure. If not, raise a ValueError.
  3. If two different elements are specified, create a cartesian product of the indices. For example, finding the Fe-O element distance in Fe2O3 would produce [(0, 2), (0, 3), (0, 4), (1, 2), (1, 3), (1, 4)]. If only one unique element is specified, create a list of pairwise combinations from the indices of that particular element. For example, finding the O-O element distance would produce [(2, 3), (2, 4), (3, 4)].
  4. For each pair of site indices, calculate the distance between the sites with the Structure.get_distance(i, j, jimage=None) method, resulting in a list of distances.
  5. Find the minimum of the distance list and determine a maximum cutoff distance by multiplying the minimum distance with some arbitrary value. The arbitrary value is calculated as (1 + a threshold value). Find all the values in the range of [minimum_distance, minimum_distance * (1 + threshold)] and return the average of this minimum distance list, with the option to return this list itself.

Additional dependencies introduced (if any)

  • None

TODO (if any)

This is my first time making a pull request. Any comment and advice would be much appreciated!

So far, I've added several tests for the following categories:

  • [x] The Si-Si distance in the default structure that comes with the StructureTest class.
  • [x] Raising a ValueError if elements that are not in the input structure are passed in as parameters.
  • [x] The Mg-O distance in a disordered structure with fractional occupancy modified from the default StructureTest structure.
  • [x] The Si-Si distance in a custom-defined simple cubic structure with the lengths of all cell edges set to be 1 angstrom. This structure only has one silicon site specified at the [0, 0, 0] corner, so a supercell of [2a, 2b, 2c] should be made in order to obtain the correct distance of 1 instead of 0.
  • [x] Make sure the input structure does not get changed inplace whenever a supercell is made.
  • [x] Still with the same structure from 4., the Si-Si distance if the threshold value is set to be 1, which would mean all the distances that are less than or equal to 2 angstroms are included. As a result, the calculated distance should be greater than 1 as the distances of the face diagonals and body diagonals are now accepted.
  • [x] Whether or not the original minimum distances are returned in a NumPy array if return_lst is set to be True.
  • [x] Whether or not only unique values are returned if only_unique is set to be True.

However, there are several tests concerning the use of jimage in the Structure.get_distance() methods that I am not quite sure about, especially when it is used along with the scaling matrix.

  • [x] If jimage is set to be None, meaning choosing the closest periodic image in the get_distance() method, in some cases, using different scaling matrices would produce the same element distance. When jimage is set to be 0, the behavior is changed, and it seems now the get_distance() method only calculates distance for sites that strictly fall inside the cell. The following code segment can be used to reproduce this issue. (Note: current implementation defaults to jimage=None so that the get_distance() method always uses the nearest periodic image.)
from pymatgen.core.lattice import Lattice
from pymatgen.core.structure import Structure

struct_primitive_3 = Structure(
            Lattice([[6, 0, 3], [4, 1, 3], [2, 0, 2]]), ["Fe", "O"], [[0, 0, 0], [0.5, 0, 0]]
        )
struct_primitive_3_222 = struct_primitive_3.copy()
struct_primitive_3_222.make_supercell(scaling_matrix=[2, 2, 2])
struct_primitive_3_121 = struct_primitive_3.copy()
struct_primitive_3_121.make_supercell(scaling_matrix=[1, 2, 1])

print("Make sure we are using the same sites for the two different scaled structures")
for site_indices, scaling_matrix, struct_scaled in zip([[0, 2], [0, 1]], [[2, 2, 2], [1, 2, 1]], [struct_primitive_3_222, struct_primitive_3_121]):
    print(f"For the structure scaled by {scaling_matrix}")
    for site_index in site_indices:
        print(f"Site {site_index}")
        print(struct_scaled[site_index])

print(struct_primitive_3_222.get_distance(0, 2, jimage=None))
print(struct_primitive_3_121.get_distance(0, 1, jimage=None))
print(struct_primitive_3_121.get_distance(0, 1, jimage=0))

Please let me know if I should be adding more tests for additional edge cases, or if the current implementation of jimage should be changed.

rpw199912j avatar Oct 18 '21 07:10 rpw199912j

Thanks for the contribution. However, I think this is too specific an analysis to be added to pymatgen and it can be done with the existing get_distance_matrix coupled with the elemental information. Can you provide more details on where such an analysis would be useful that it justifies the creation of this method? Thanks.

shyuep avatar Oct 26 '21 16:10 shyuep

The original motivation is that I was writing several machine learning featurizers where you need to know the metal-metal and metal-nonmental elemental distance to calculate approximation values for the Hubbard U and charge-transfer gap, as seen in the Zaanen-Sawatzky-Allen framework. These two energy features can be used to differentiate metals from insulators.

As you mentioned, this is a convenience method to quickly get the distance between elements without having to manually look up elemental information beforehand, and it serves as the basis for the machine learning featurizers I used in my own research. I totally agree with you that people can achieve the same result using existing methods, but before writing this method, I often find myself finding the element-site mapping by printing out the structure, or loading the structure in a visualization software like VESTA to figure out which sites belong to which elements and are the closest to one another. The biggest utility of this method is that it does all of the above functionalities for you, and can accommodate disordered structures with fractional occupancy, and automatically make a supercell when the input structure does not reflect global symmetry. It would be especially useful if you need to process hundreds if not thousands of structures in a high-throughput study.

I could just include this method within the Hubbard U and charge transfer gap featurizers and submit a pull request to matminer, but I think this method would benefit a more general audience who are not just looking to classify metals and insulators for material informatics research.

rpw199912j avatar Oct 26 '21 19:10 rpw199912j

Hi @rpw199912j, thank you for the thoughtful PR and explanation.

I agree with @shyuep that this is a somewhat niche functionality for general use but I can see the usefulness for the featurizer. The code is also very well-documented.

mkhorton avatar Nov 08 '21 19:11 mkhorton

@rpw199912j we can surely include this in matminer, please open a WIP PR on the matminer repo and tag me!

ardunn avatar Feb 12 '22 02:02 ardunn

Closing this as not planned since the functionality seems better placed in matminer.

janosh avatar Feb 03 '23 17:02 janosh