api icon indicating copy to clipboard operation
api copied to clipboard

Problems retrieving TaskDocs for materials

Open keeganq opened this issue 1 year ago • 5 comments

I'm trying to retrieve charge density data, and the corresponding task information for the calculations that produced that data.

I'd like to be able to download the VASP input and output files associated with the volumetric charge density data for some materials.

Version Info

python==3.9.16
mp-api==0.30.10
pymatgen==2023.3.23
boto3=1.26.99
emmet-core==0.51.1

Reproduction

I'm trying to retrieve charge density for materials with inc_task_doc=True

from mp_api.client import MPRester

mpid = "mp-149"

with MPRester("<api_key>") as mpr:
    chgcar = mpr.get_charge_density_from_material_id(mpid, inc_task_doc=True) 

Produces output:


ValueError: No POTCAR for Si with functional PBE found. Please set the PMG_VASP_PSP_DIR environment in .pmgrc.yaml, or you may need to set PMG_DEFAULT_FUNCTIONAL to PBE_52 or PBE_54 if you are using newer psps from VASP.
Full Stack Trace

Retrieving MaterialsDoc documents: 100%|██████████| 1/1 [00:00<00:00, 27413.75it/s]
Retrieving ChgcarDataDoc documents: 100%|██████████| 2/2 [00:00<00:00, 60787.01it/s]
Retrieving ChgcarDataDoc documents: 100%|██████████| 1/1 [00:00<00:00, 25575.02it/s]

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[8], line 2
      1 with MPRester("") as mpr:
----> 2     chgcar = mpr.get_charge_density_from_material_id(mpid, inc_task_doc=True) # task=True ?? Look at github
      3 #     print(chgcar)

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/mp_api/client/mprester.py:1101, in MPRester.get_charge_density_from_material_id(self, material_id, inc_task_doc)
   1098     raise MPRestError(f"No charge density fetched for {material_id}.")
   1100 if inc_task_doc:
-> 1101     task_doc = self.tasks.get_data_by_id(latest_doc.task_id)
   1102     return chgcar, task_doc
   1104 return chgcar

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/mp_api/client/core/client.py:839, in BaseRester.get_data_by_id(self, document_id, fields)
    836 results = []  # type: List
    838 try:
--> 839     results = self._query_resource_data(criteria=criteria, fields=fields, suburl=document_id)  # type: ignore
    840 except MPRestError:
    842     if self.primary_key == "material_id":
    843         # see if the material_id has changed, perhaps a task_id was supplied
    844         # this should likely be re-thought

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/mp_api/client/core/client.py:797, in BaseRester._query_resource_data(self, criteria, fields, suburl, use_document_model, timeout)
    774 def _query_resource_data(
    775     self,
    776     criteria: Optional[Dict] = None,
   (...)
    780     timeout: Optional[int] = None,
    781 ) -> Union[List[T], List[Dict]]:
    782     """
    783     Query the endpoint for a list of documents without associated meta information. Only
    784     returns a single page of results.
   (...)
    794         A list of documents
    795     """
--> 797     return self._query_resource(  # type: ignore
    798         criteria=criteria,
    799         fields=fields,
    800         suburl=suburl,
    801         use_document_model=use_document_model,
    802         chunk_size=1000,
    803         num_chunks=1,
    804     ).get("data")

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/mp_api/client/core/client.py:295, in BaseRester._query_resource(self, criteria, fields, suburl, use_document_model, parallel_param, num_chunks, chunk_size, timeout)
    292         if not url.endswith("/"):
    293             url += "/"
--> 295     data = self._submit_requests(
    296         url=url,
    297         criteria=criteria,
    298         use_document_model=use_document_model,
    299         parallel_param=parallel_param,
    300         num_chunks=num_chunks,
    301         chunk_size=chunk_size,
    302         timeout=timeout,
    303     )
    305     return data
    307 except RequestException as ex:

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/mp_api/client/core/client.py:429, in BaseRester._submit_requests(self, url, criteria, use_document_model, parallel_param, num_chunks, chunk_size, timeout)
    425 remaining_docs_avail = {}
    427 initial_params_list = [{"url": url, "verify": True, "params": copy(crit)} for crit in new_criteria]
--> 429 initial_data_tuples = self._multi_thread(use_document_model, initial_params_list)
    431 for data, subtotal, crit_ind in initial_data_tuples:
    433     subtotals.append(subtotal)

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/mp_api/client/core/client.py:634, in BaseRester._multi_thread(self, use_document_model, params_list, progress_bar, timeout)
    630 finished, futures = wait(futures, return_when=FIRST_COMPLETED)
    632 for future in finished:
--> 634     data, subtotal = future.result()
    636     if progress_bar is not None:
    637         progress_bar.update(len(data["data"]))

File ~/.conda/envs/materials-project/lib/python3.9/concurrent/futures/_base.py:439, in Future.result(self, timeout)
    437     raise CancelledError()
    438 elif self._state == FINISHED:
--> 439     return self.__get_result()
    441 self._condition.wait(timeout)
    443 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:

File ~/.conda/envs/materials-project/lib/python3.9/concurrent/futures/_base.py:391, in Future.__get_result(self)
    389 if self._exception:
    390     try:
--> 391         raise self._exception
    392     finally:
    393         # Break a reference cycle with the exception in self._exception
    394         self = None

File ~/.conda/envs/materials-project/lib/python3.9/concurrent/futures/thread.py:58, in _WorkItem.run(self)
     55     return
     57 try:
---> 58     result = self.fn(*self.args, **self.kwargs)
     59 except BaseException as exc:
     60     self.future.set_exception(exc)

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/mp_api/client/core/client.py:685, in BaseRester._submit_request_and_process(self, url, verify, params, use_document_model, timeout)
    682 if response.status_code == 200:
    684     if self.monty_decode:
--> 685         data = json.loads(response.text, cls=MontyDecoder)
    686     else:
    687         data = json.loads(response.text)

File ~/.conda/envs/materials-project/lib/python3.9/json/__init__.py:359, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    357 if parse_constant is not None:
    358     kw['parse_constant'] = parse_constant
--> 359 return cls(**kw).decode(s)

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:475, in MontyDecoder.decode(self, s)
    473 else:
    474     d = json.loads(s)
--> 475 return self.process_decoded(d)

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:454, in MontyDecoder.process_decoded(self, d)
    451         elif (bson is not None) and modname == "bson.objectid" and classname == "ObjectId":
    452             return bson.objectid.ObjectId(d["oid"])
--> 454     return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()}
    456 if isinstance(d, list):
    457     return [self.process_decoded(x) for x in d]

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:454, in <dictcomp>(.0)
    451         elif (bson is not None) and modname == "bson.objectid" and classname == "ObjectId":
    452             return bson.objectid.ObjectId(d["oid"])
--> 454     return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()}
    456 if isinstance(d, list):
    457     return [self.process_decoded(x) for x in d]

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:457, in MontyDecoder.process_decoded(self, d)
    454     return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()}
    456 if isinstance(d, list):
--> 457     return [self.process_decoded(x) for x in d]
    459 return d

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:457, in <listcomp>(.0)
    454     return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()}
    456 if isinstance(d, list):
--> 457     return [self.process_decoded(x) for x in d]
    459 return d

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:454, in MontyDecoder.process_decoded(self, d)
    451         elif (bson is not None) and modname == "bson.objectid" and classname == "ObjectId":
    452             return bson.objectid.ObjectId(d["oid"])
--> 454     return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()}
    456 if isinstance(d, list):
    457     return [self.process_decoded(x) for x in d]

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:454, in <dictcomp>(.0)
    451         elif (bson is not None) and modname == "bson.objectid" and classname == "ObjectId":
    452             return bson.objectid.ObjectId(d["oid"])
--> 454     return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()}
    456 if isinstance(d, list):
    457     return [self.process_decoded(x) for x in d]

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:454, in MontyDecoder.process_decoded(self, d)
    451         elif (bson is not None) and modname == "bson.objectid" and classname == "ObjectId":
    452             return bson.objectid.ObjectId(d["oid"])
--> 454     return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()}
    456 if isinstance(d, list):
    457     return [self.process_decoded(x) for x in d]

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:454, in <dictcomp>(.0)
    451         elif (bson is not None) and modname == "bson.objectid" and classname == "ObjectId":
    452             return bson.objectid.ObjectId(d["oid"])
--> 454     return {self.process_decoded(k): self.process_decoded(v) for k, v in d.items()}
    456 if isinstance(d, list):
    457     return [self.process_decoded(x) for x in d]

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/monty/json.py:427, in MontyDecoder.process_decoded(self, d)
    425 data = {k: v for k, v in d.items() if not k.startswith("@")}
    426 if hasattr(cls_, "from_dict"):
--> 427     return cls_.from_dict(data)
    428 if pydantic is not None and issubclass(cls_, pydantic.BaseModel):  # pylint: disable=E1101
    429     return cls_(**data)

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/pymatgen/io/vasp/inputs.py:2262, in Potcar.from_dict(cls, d)
   2256 @classmethod
   2257 def from_dict(cls, d):
   2258     """
   2259     :param d: Dict representation
   2260     :return: Potcar
   2261     """
-> 2262     return Potcar(symbols=d["symbols"], functional=d["functional"])

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/pymatgen/io/vasp/inputs.py:2243, in Potcar.__init__(self, symbols, functional, sym_potcar_map)
   2241 self.functional = functional
   2242 if symbols is not None:
-> 2243     self.set_symbols(symbols, functional, sym_potcar_map)

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/pymatgen/io/vasp/inputs.py:2339, in Potcar.set_symbols(self, symbols, functional, sym_potcar_map)
   2337 else:
   2338     for el in symbols:
-> 2339         p = PotcarSingle.from_symbol_and_functional(el, functional)
   2340         self.append(p)

File ~/.conda/envs/materials-project/lib/python3.9/site-packages/pymatgen/io/vasp/inputs.py:1897, in PotcarSingle.from_symbol_and_functional(symbol, functional)
   1895 d = SETTINGS.get("PMG_VASP_PSP_DIR")
   1896 if d is None:
-> 1897     raise ValueError(
   1898         f"No POTCAR for {symbol} with functional {functional} found. Please set the PMG_VASP_PSP_DIR "
   1899         "environment in .pmgrc.yaml, or you may need to set PMG_DEFAULT_FUNCTIONAL to PBE_52 or "
   1900         "PBE_54 if you are using newer psps from VASP."
   1901     )
   1902 paths_to_try = [
   1903     os.path.join(d, funcdir, f"POTCAR.{symbol}"),
   1904     os.path.join(d, funcdir, symbol, "POTCAR"),
   1905 ]
   1906 for p in paths_to_try:

ValueError: No POTCAR for Si with functional PBE found. Please set the PMG_VASP_PSP_DIR environment in .pmgrc.yaml, or you may need to set PMG_DEFAULT_FUNCTIONAL to PBE_52 or PBE_54 if you are using newer psps from VASP.

Looking through the stack trace, it looks like the api is trying to retrieve the docs associated with the latest task, and is unable to locate some vasp files for that task.

To get around this issue, I also tried retrieving ALL of the download information for this material:

with MPRester("<api-key>") as mpr:
    data = mpr.get_download_info(material_ids=["mp-149"])

And received the output (task docs metadata, task NOMAD url where it exists):

({MPID(mp-149): [{'task_id': 'mp-655585',
    'calc_type': <CalcType.GGA_Static: 'GGA Static'>},
   {'task_id': 'mp-656511',
    'calc_type': <CalcType.GGA_NSCF_Line: 'GGA NSCF Line'>},
   {'task_id': 'mp-655936',
    'calc_type': <CalcType.GGA_NSCF_Uniform: 'GGA NSCF Uniform'>},
   {'task_id': 'mp-11721',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>},
   {'task_id': 'mp-149',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>},
   {'task_id': 'mp-1057373', 'calc_type': <CalcType.GGA_Static: 'GGA Static'>},
   {'task_id': 'mp-1057366',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>},
   {'task_id': 'mp-1057380',
    'calc_type': <CalcType.GGA_NSCF_Uniform: 'GGA NSCF Uniform'>},
   {'task_id': 'mp-1059585',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>},
   {'task_id': 'mp-1059589', 'calc_type': <CalcType.GGA_Static: 'GGA Static'>},
   {'task_id': 'mp-1059603',
    'calc_type': <CalcType.GGA_NSCF_Uniform: 'GGA NSCF Uniform'>},
   {'task_id': 'mp-1120258',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>},
   {'task_id': 'mp-1120259',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>},
   {'task_id': 'mp-1141021',
    'calc_type': <CalcType.GGA_DFPT_Dielectric: 'GGA DFPT Dielectric'>},
   {'task_id': 'mp-1248038',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>},
   {'task_id': 'mp-1249516',
    'calc_type': <CalcType.GGA_NMR_Electric_Field_Gradient: 'GGA NMR Electric Field Gradient'>},
   {'task_id': 'mp-1267607',
    'calc_type': <CalcType.GGA_NMR_Nuclear_Shielding: 'GGA NMR Nuclear Shielding'>},
   {'task_id': 'mp-1440634', 'calc_type': <CalcType.GGA_Static: 'GGA Static'>},
   {'task_id': 'mp-1686587',
    'calc_type': <CalcType.GGA_NSCF_Uniform: 'GGA NSCF Uniform'>},
   {'task_id': 'mp-1791788', 'calc_type': <CalcType.GGA_Static: 'GGA Static'>},
   {'task_id': 'mp-1594776',
    'calc_type': <CalcType.GGA_NSCF_Line: 'GGA NSCF Line'>},
   {'task_id': 'mp-1592727',
    'calc_type': <CalcType.GGA_NSCF_Line: 'GGA NSCF Line'>},
   {'task_id': 'mp-1947498',
    'calc_type': <CalcType.R2SCAN_Structure_Optimization: 'R2SCAN Structure Optimization'>},
   {'task_id': 'mp-1950734',
    'calc_type': <CalcType.PBESol_Structure_Optimization: 'PBESol Structure Optimization'>},
   {'task_id': 'mp-1059604',
    'calc_type': <CalcType.GGA_NSCF_Line: 'GGA NSCF Line'>},
   {'task_id': 'mp-1057384',
    'calc_type': <CalcType.GGA_NSCF_Line: 'GGA NSCF Line'>},
   {'task_id': 'mp-1536661',
    'calc_type': <CalcType.SCAN_Structure_Optimization: 'SCAN Structure Optimization'>},
   {'task_id': 'mp-2250750',
    'calc_type': <CalcType.GGA_NSCF_Uniform: 'GGA NSCF Uniform'>},
   {'task_id': 'mp-2299819',
    'calc_type': <CalcType.HSE06_Static: 'HSE06 Static'>},
   {'task_id': 'mp-2291052', 'calc_type': <CalcType.GGA_Static: 'GGA Static'>},
   {'task_id': 'mp-2683378',
    'calc_type': <CalcType.GGA_Structure_Optimization: 'GGA Structure Optimization'>}]},
 ['https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-11721',
  'https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-149',
  'https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-1057366',
  'https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-1057380',
  'https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-1059585',
  'https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-1059589',
  'https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-1059604',
  'https://nomad-lab.eu/prod/rae/api/raw/query?external_id=mp-1057384'])

So I can get the task info this way, it's not clear which of these calculations is associated with the material's charge density data.

The two questions I have:

  1. Is the ValueError seen with the get_charge_density_from_material_id method a bug?
  2. Is there a way to find the task_id that produced the charge density data for any one material? Then I could download the VASP files associated with that task.

Thanks for any help you can offer!

keeganq avatar Mar 24 '23 21:03 keeganq

@keeganq, thanks for reporting this issue. This is happening as the API client by default tries to deserialize data into appropriate pymatgen objects. Since you have pymatgen installed but do not have the POTCAR configuration fully functional, it is giving you problems. This is something we are aware of on our end, and are planning a couple of different changes to fix it. For now, the easiest thing to do would be to pass monty_decode=False to MPRester alongside your API key. This should disable all deserialization by the client.

Additionally, I have just realized that the latest changes to the TaskDoc model in emmet-core have broken pulling task data through the API. I have just pinned emmet-core<=0.50.0, and have patch released to mp-api==0.30.11. Before pulling data, I would update your installation of both packages.

munrojm avatar Mar 24 '23 22:03 munrojm

Thanks @munrojm! This is looking much better now. I am able to retrieve a TaskDoc with get_charge_density_from_material_id(<mpid>, inc_task_doc=True). Would it be safe to assume that this TaskDoc is the one that is associated with the calculations used for the volumetric charge density data?

keeganq avatar Mar 27 '23 15:03 keeganq

Yup! That is correct. The CHGCAR is taken from that specific calculation.

munrojm avatar Mar 27 '23 16:03 munrojm

An update on this: I was able to configure pymatgen with a local set of POTCAR files, and was previously able to retrieve TaskDocs with monty_decode=True in MPRester, as you suggested. These TaskDocs would have decoded objects, specifically TaskDoc.orig_inputs.potcar would be a list of pymatgen.io.vasp.inputs.PotcarSingle objects.

Unfortunately, this isn't working after some recent changes to the API. The potcar is instead returned as an emmet Potcar object, i.e. it was not decoded. I think I've identified the problem, and it looks very intentional:

https://github.com/materialsproject/api/blob/3ffecd21a859d8a9314ce64faa0d76c15ad29c5c/mp_api/client/mprester.py#L216-L218

Assuming that this behavior was intended, is there a new recommended way to decode objects in the TaskDoc?

Thanks as always for your help!

keeganq avatar May 18 '23 19:05 keeganq

I've actually default disabled Monty decoding for the task endpoint while we get a better solution for this. You can instead pass the data to the process decoded method of 'MontyDecoder' to manually decode. Instantiating the 'TaskDoc' with the data as input arguments should also decode any data that isn't nested using monty.

munrojm avatar May 19 '23 13:05 munrojm