api icon indicating copy to clipboard operation
api copied to clipboard

`ProvenanceRester`: query parameters which cannot be used: `nsites`, `elements` even though in `mpr.provenance.available_fields`

Open sgbaird opened this issue 3 years ago • 5 comments

with MPRester(api_key) as mpr:
    mpr.provenance.search(nsites=(1, 52), elements=["V"])
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\sterg\Miniconda3\envs\mp-time-split\lib\site-packages\mp_api\core\client.py", line 786, in search
    return self._get_all_documents(
  File "C:\Users\sterg\Miniconda3\envs\mp-time-split\lib\site-packages\mp_api\core\client.py", line 835, in _get_all_documents
    results = self._query_resource(
  File "C:\Users\sterg\Miniconda3\envs\mp-time-split\lib\site-packages\mp_api\core\client.py", line 288, in _query_resource
    data = self._submit_requests(
  File "C:\Users\sterg\Miniconda3\envs\mp-time-split\lib\site-packages\mp_api\core\client.py", line 387, in _submit_requests
    initial_data_tuples = self._multi_thread(use_document_model, initial_params_list)
  File "C:\Users\sterg\Miniconda3\envs\mp-time-split\lib\site-packages\mp_api\core\client.py", line 587, in _multi_thread
    data, subtotal = future.result()
  File "C:\Users\sterg\Miniconda3\envs\mp-time-split\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "C:\Users\sterg\Miniconda3\envs\mp-time-split\lib\concurrent\futures\_base.py", line 391, in __get_result
    raise self._exception
  File "C:\Users\sterg\Miniconda3\envs\mp-time-split\lib\concurrent\futures\thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\Users\sterg\Miniconda3\envs\mp-time-split\lib\site-packages\mp_api\core\client.py", line 652, in _submit_request_and_process
    raise MPRestError(
mp_api.core.client.MPRestError: REST query returned with error status code 400 on URL https://api.materialsproject.org/provenance/?nsites=1&nsites=52&elements=V&_all_fields=True&_limit=1000 with message:
Request contains query parameters which cannot be used: nsites, elements

Note that nsites and elements are in available_fields:

mpr.provenance.available_fields
['builder_meta', 'nsites', 'elements', 'nelements', 'composition', 'composition_reduced', 'formula_pretty', 'formula_anonymous', 'chemsys', 'volume', 'density', 'density_atomic', 'symmetry', 'property_name', ...]
special variables
function variables
00:'builder_meta'
01:'nsites'
02:'elements'
03:'nelements'
04:'composition'
05:'composition_reduced'
06:'formula_pretty'
07:'formula_anonymous'
08:'chemsys'
09:'volume'
10:'density'
11:'density_atomic'
12:'symmetry'
13:'property_name'
14:'material_id'
15:'deprecated'
16:'deprecation_reasons'
17:'last_updated'
18:'origins'
19:'warnings'
20:'created_at'
21:'references'
22:'authors'
23:'remarks'
24:'tags'
25:'theoretical'
26:'database_IDs'
27:'history'
len():28

Similar to #612, get_data_by_id() seems to work fine:

mpr.provenance.get_data_by_id('mp-771054')
ProvenanceDoc(builder_meta=EmmetMeta(emmet_version='0.18.0', pymatgen_version='2022.0.16', pull_request=644, database_version='2021.11.10', build_date=datetime.datetime(2021, 11, 25, 10, 20, 36, 310000)), nsites=None, elements=None, nelements=None, composition=None, composition_reduced=None, formula_pretty=None, formula_anonymous=None, chemsys=None, volume=None, density=None, density_atomic=None, symmetry=None, property_name='provenance', material_id=MPID(mp-771054), deprecated=False, deprecation_reasons=None, last_updated=datetime.datetime(2021, 11, 25, 10, 20, 36, 310000), origins=[], warnings=[], created_at=datetime.datetime(2021, 11, 25, 10, 20, 36, 310000), references=['@article{Jain2013,\nauthor = {Jain, Anubhav and Ong, Shyue Ping and Hautier, Geoffroy and Chen, Wei and Richards, William Davidson and Dacek, Stephen and Cholia, Shreyas and Gunter, Dan and Skinner, David and Ceder, Gerbrand and Persson, Kristin a.},\ndoi = {10.1063/1.4812323},\nissn = {2166532X},\njournal = {APL Materials},\nnumber = {1},\npages = {011002},\ntitle = {{The Materials Project: A materials genome approach to accelerating materials innovation}},\nurl = {http://link.aip.org/link/AMPADS/v1/i1/p011002/s1\\&Agg=doi},\nvolume = {1},\nyear = {2013}\n}\n\n@misc{MaterialsProject,\ntitle = {{Materials Project}},\nurl = {http://www.materialsproject.org}\n}'], authors=[Author(name='Materials Project', email='[email protected]')], remarks=[], tags=[], theoretical=True, database_IDs={}, history=[History(name='Materials Project Optimized Structure', url='http://www.materialsproject.org', description=None)])

sgbaird avatar Jun 02 '22 02:06 sgbaird

Related: https://matsci.org/t/how-do-i-do-a-time-split-of-materials-project-entries-e-g-pre-2018-vs-post-2018/42584/2

sgbaird avatar Jun 02 '22 03:06 sgbaird

#614

sgbaird avatar Jun 03 '22 21:06 sgbaird

Hi @sgbaird, the available_fields property is meant to refer to the data available from the endpoint, not necessarily which fields you can use to query that data with. With the latest client release v0.24.0 (yesterday afternoon), the generic search method input arguments are meant to show which fields can be used.

munrojm avatar Jun 03 '22 21:06 munrojm

I think a workaround for me right now (just recently thought of / tested) is to search all the ProvenanceDoc-s via ProvenanceRester and use material_id as the link to data pulled from SummaryRester.

num_sites = (1,4)
elements = ["V"]
fields = [
        "structure",
        "material_id",
        "theoretical",
        "energy_above_hull",
        "formation_energy_per_atom",
    ]
with MPRester(api_key) as mpr:
    results = mpr.summary.search(
        num_sites=num_sites, elements=elements, fields=fields
    )

    if fields is not None:
        field_data = []
        for r in results:
            field_data.append({field: getattr(r, field) for field in fields})
    else:
        field_data = results

    material_id = [fd["material_id"] for fd in field_data]

    index = [int(mid.replace("mp-", "")) for mid in material_id]
    df = pd.DataFrame(field_data, index=index)
    df = df.sort_index()

    expt_df = df.query("theoretical == False")
    expt_material_id = expt_df.material_id.tolist()

    provenance_results = mpr.provenance.search(
        fields=["references", "material_id"]
    )
    provenance_ids = [pr.material_id for pr in provenance_results]
    prov_df = pd.Series(
        name="provenance", data=provenance_results, index=provenance_ids
    )
    prov_df.loc[expt_material_id]

Based on https://github.com/sparks-baird/mp-time-split/blob/d7a19db308663ff371cb677e34e6d2aaa3aeb5ce/src/mp_time_split/utils/data.py#L119-L162

It's fairly quick, too. I think this will work for me for now. Thank you for getting back to me!

sgbaird avatar Jun 03 '22 21:06 sgbaird

No problem! Glad this works for you and is fast. Thanks for posting the snippet. I'll keep this issue open until lists of MPIDS are supported for the provenance search endpoint. I have just added this to my list for the next emmet-api minor release.

munrojm avatar Jun 03 '22 22:06 munrojm