api icon indicating copy to clipboard operation
api copied to clipboard

New return format are sometimes incorrect and is much harder to convert to DataFrame

Open shyuep opened this issue 1 year ago • 9 comments

Example:

from pymatgen.ext.matproj import MPRester
with MPRester("<api key>") as mpr:
    data = mpr.summary.search(chemsys=["*-O"], fields=["material_id", "formula_pretty", "energy_per_atom", "band_gap", "k_vrh"])
# What is returned is a list. Let's just see what the first item in the list looks out. 
import pprint
pprint.pprint(data[0])

gives

MPDataDoc<SummaryDoc>(
formula_pretty='In2O3',
material_id=MPID(mp-1245202),
energy_per_atom=-5.6541087496,
band_gap=0.6920999999999999,
fields_not_requested=['builder_meta', 'nsites', 'elements', 'nelements', 'composition', 'composition_reduced', 'formula_anonymous', 'chemsys', 'volume', 'density', 'density_atomic', 'symmetry', 'property_name', 'deprecated', 'deprecation_reasons', 'last_updated', 'origins', 'warnings', 'structure', 'task_ids', 'uncorrected_energy_per_atom', 'formation_energy_per_atom', 'energy_above_hull', 'is_stable', 'equilibrium_reaction_energy_per_atom', 'decomposes_to', 'xas', 'grain_boundaries', 'cbm', 'vbm', 'efermi', 'is_gap_direct', 'is_metal', 'es_source_calc_id', 'bandstructure', 'dos', 'dos_energy_up', 'dos_energy_down', 'is_magnetic', 'ordering', 'total_magnetization', 'total_magnetization_normalized_vol', 'total_magnetization_normalized_formula_units', 'num_magnetic_sites', 'num_unique_magnetic_sites', 'types_of_magnetic_species', 'k_voigt', 'k_reuss', 'k_vrh', 'g_voigt', 'g_reuss', 'g_vrh', 'universal_anisotropy', 'homogeneous_poisson', 'e_total', 'e_ionic', 'e_electronic', 'n', 'e_ij_max', 'weighted_surface_energy_EV_PER_ANG2', 'weighted_surface_energy', 'weighted_work_function', 'surface_anisotropy', 'shape_factor', 'has_reconstructed', 'possible_species', 'has_props', 'theoretical']
)

The returned summary says that k_vrh was not requested when it was. Using the expensive all_fields=True also does not return the k_vrh.

Further, the new return format is a list of SummaryDocs. These do not support a dict-like API and so, what used to be an easy way to convert MPRester queries to a pandas DataFrame now requires a conversion. This is not ideal and non-obvious.

# Old MPRester
df = pd.DataFrame(data)
# New MPRester
df = pd.DataFrame([d.dict() for d in data])

shyuep avatar Sep 21 '22 17:09 shyuep