api New return format are sometimes incorrect and is much harder to convert to DataFrame

New return format are sometimes incorrect and is much harder to convert to DataFrame

Open shyuep opened this issue 1 year ago • 9 comments

Example:

from pymatgen.ext.matproj import MPRester
with MPRester("<api key>") as mpr:
    data = mpr.summary.search(chemsys=["*-O"], fields=["material_id", "formula_pretty", "energy_per_atom", "band_gap", "k_vrh"])
# What is returned is a list. Let's just see what the first item in the list looks out. 
import pprint
pprint.pprint(data[0])

gives

MPDataDoc<SummaryDoc>(
formula_pretty='In2O3',
material_id=MPID(mp-1245202),
energy_per_atom=-5.6541087496,
band_gap=0.6920999999999999,
fields_not_requested=['builder_meta', 'nsites', 'elements', 'nelements', 'composition', 'composition_reduced', 'formula_anonymous', 'chemsys', 'volume', 'density', 'density_atomic', 'symmetry', 'property_name', 'deprecated', 'deprecation_reasons', 'last_updated', 'origins', 'warnings', 'structure', 'task_ids', 'uncorrected_energy_per_atom', 'formation_energy_per_atom', 'energy_above_hull', 'is_stable', 'equilibrium_reaction_energy_per_atom', 'decomposes_to', 'xas', 'grain_boundaries', 'cbm', 'vbm', 'efermi', 'is_gap_direct', 'is_metal', 'es_source_calc_id', 'bandstructure', 'dos', 'dos_energy_up', 'dos_energy_down', 'is_magnetic', 'ordering', 'total_magnetization', 'total_magnetization_normalized_vol', 'total_magnetization_normalized_formula_units', 'num_magnetic_sites', 'num_unique_magnetic_sites', 'types_of_magnetic_species', 'k_voigt', 'k_reuss', 'k_vrh', 'g_voigt', 'g_reuss', 'g_vrh', 'universal_anisotropy', 'homogeneous_poisson', 'e_total', 'e_ionic', 'e_electronic', 'n', 'e_ij_max', 'weighted_surface_energy_EV_PER_ANG2', 'weighted_surface_energy', 'weighted_work_function', 'surface_anisotropy', 'shape_factor', 'has_reconstructed', 'possible_species', 'has_props', 'theoretical']
)

The returned summary says that k_vrh was not requested when it was. Using the expensive all_fields=True also does not return the k_vrh.

Further, the new return format is a list of SummaryDocs. These do not support a dict-like API and so, what used to be an easy way to convert MPRester queries to a pandas DataFrame now requires a conversion. This is not ideal and non-obvious.

# Old MPRester
df = pd.DataFrame(data)
# New MPRester
df = pd.DataFrame([d.dict() for d in data])

Sep 21 '22 17:09 shyuep

api api copied to clipboard

New return format are sometimes incorrect and is much harder to convert to DataFrame

api
api copied to clipboard