[Bug]: ValidationError 6 validation errors for MoleculeSummaryDoc builder_meta.build_date
Code snippet
from mp_api.client import MPRester
with MPRester(api_key) as mpr:
docs = mpr.molecules.summary.search()
What happened?
I am attempting to get molecules from API using the attached code. It can work at first, but when the progress bar reaches 155361/221598, an error will be reported. Could you please check it?
Version
mp-api 0.41.2
Which OS?
- [ ] MacOS
- [X] Windows
- [ ] Linux
Log output
ValidationError: 6 validation errors for MoleculeSummaryDoc
builder_meta.build_date
Value error, Invalid isoformat string: '2023-11-07T22:35:04.718Z' [type=value_error, input_value={'$date': '2023-11-07T22:35:04.718Z'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.8/v/value_error
last_updated
Input should be a valid datetime [type=datetime_type, input_value={'$date': '2023-11-07T22:35:04.718Z'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.8/v/datetime_type
origins.0.last_updated
Value error, Invalid isoformat string: '2020-11-11T12:51:27.833Z' [type=value_error, input_value={'$date': '2020-11-11T12:51:27.833Z'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.8/v/value_error
origins.1.last_updated
Value error, Invalid isoformat string: '2023-08-03T18:52:51.206Z' [type=value_error, input_value={'$date': '2023-08-03T18:52:51.206Z'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.8/v/value_error
origins.2.last_updated
Value error, Invalid isoformat string: '2020-11-11T12:51:27.833Z' [type=value_error, input_value={'$date': '2020-11-11T12:51:27.833Z'}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.8/v/value_error
has_props
Input should be a valid list [type=list_type, input_value={'materials': True, 'ther...lse, 'substrates': True}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.8/v/list_type
I will investigate this. For now, you can pass use_document_model=False to MPRester to fix the issue. Note the data will. be returned as dictionaries instead of MPDataDoc objects.
I will investigate this. For now, you can pass
use_document_model=FalsetoMPResterto fix the issue. Note the data will. be returned as dictionaries instead ofMPDataDocobjects.
Thanks for your reply. I reinstalled my conda env and downgraded python version to 3.8. It works now, although I'm unsure which step resolved the issue.
Just to bump this, I'm also running into the same issue. Setting fields=["structure"] (all I need in this case) avoids it.
Thanks for bumping this. We'll need to take a second look at it. As @munrojm mentioned, in the meantime, you can use the use_document_model=False argument to MPRester() to disable validation and return the data as simple dictionaries.
@kavanase I just released mp-api v0.44.0rc0 with a (temporary) fix for this. Could you give it a try? Thanks! Note that we've fixed the data behind the scenes already and this error shouldn't show up when you include a query (e.g. on nelements). Only full downloads without an explicit query are affected when using previous versions of the client.
@tschaume can you link to the PR that fixed the issue?
@kalvdans it would be PR #947 but I'd consider it a temporary fix since I simply disabled the rerouting of full download requests from our MongoDB to our OpenData repositories. We're working on sync'ing the data in the OpenData repo and should be able to revert #947 with the upcoming data release. HTH.
Hi @tschaume, I just checked with the new release, but I'm still getting a ValidationError:
The workaround is fine for my use case btw, but I guess would be nice for this simpler call to work without issue too.
Full traceback:
ValidationError Traceback (most recent call last)
Cell In[1], line 4
1 from pymatgen.ext.matproj import MPRester
3 with MPRester() as mpr:
----> 4 docs = mpr.molecules.summary.search()
File ~/Packages/api/mp_api/client/routes/molecules/summary.py:131, in MoleculesSummaryRester.search(self, charge, spin_multiplicity, nelements, chemsys, deprecated, elements, exclude_elements, formula, has_props, molecule_ids, num_chunks, chunk_size, all_fields, fields)
123 query_params.update({"has_props": ",".join([i.value for i in has_props])})
125 query_params = {
126 entry: query_params[entry]
127 for entry in query_params
128 if query_params[entry] is not None
129 }
--> 131 return super()._search(
132 num_chunks=num_chunks,
133 chunk_size=chunk_size,
134 all_fields=all_fields,
135 fields=fields,
136 **query_params,
137 )
File ~/Packages/api/mp_api/client/core/client.py:1182, in BaseRester._search(self, num_chunks, chunk_size, all_fields, fields, **kwargs)
1160 """A generic search method to retrieve documents matching specific parameters.
1161
1162 Arguments:
(...)
1177 A list of documents.
1178 """
1179 # This method should be customized for each end point to give more user friendly,
1180 # documented kwargs.
-> 1182 return self._get_all_documents(
1183 kwargs,
1184 all_fields=all_fields,
1185 fields=fields,
1186 chunk_size=chunk_size,
1187 num_chunks=num_chunks,
1188 )
File ~/Packages/api/mp_api/client/core/client.py:1255, in BaseRester._get_all_documents(self, query_params, all_fields, fields, chunk_size, num_chunks)
1241 list_entries = sorted(
1242 (
1243 (key, len(entry.split(",")))
(...)
1250 reverse=True,
1251 )
1253 chosen_param = list_entries[0][0] if len(list_entries) > 0 else None
-> 1255 results = self._query_resource(
1256 query_params,
1257 fields=fields,
1258 parallel_param=chosen_param,
1259 chunk_size=chunk_size,
1260 num_chunks=num_chunks,
1261 )
1263 return results["data"]
File ~/Packages/api/mp_api/client/core/client.py:569, in BaseRester._query_resource(self, criteria, fields, suburl, use_document_model, parallel_param, num_chunks, chunk_size, timeout)
567 data["meta"]["total_doc"] = len(data["data"])
568 else:
--> 569 data = self._submit_requests(
570 url=url,
571 criteria=criteria,
572 use_document_model=not query_s3 and use_document_model,
573 parallel_param=parallel_param,
574 num_chunks=num_chunks,
575 chunk_size=chunk_size,
576 timeout=timeout,
577 )
578 return data
580 except RequestException as ex:
File ~/Packages/api/mp_api/client/core/client.py:716, in BaseRester._submit_requests(self, url, criteria, use_document_model, chunk_size, parallel_param, num_chunks, timeout)
703 remaining_docs_avail = {}
705 initial_params_list = [
706 {
707 "url": url,
(...)
713 for crit in new_criteria
714 ]
--> 716 initial_data_tuples = self._multi_thread(
717 self._submit_request_and_process, initial_params_list
718 )
720 for data, subtotal, crit_ind in initial_data_tuples:
721 subtotals.append(subtotal)
File ~/Packages/api/mp_api/client/core/client.py:938, in BaseRester._multi_thread(self, func, params_list, progress_bar)
935 finished, futures = wait(futures, return_when=FIRST_COMPLETED)
937 for future in finished:
--> 938 data, subtotal = future.result()
940 if progress_bar is not None:
941 if isinstance(data, dict):
File ~/miniconda3/lib/python3.10/concurrent/futures/_base.py:451, in Future.result(self, timeout)
449 raise CancelledError()
450 elif self._state == FINISHED:
--> 451 return self.__get_result()
453 self._condition.wait(timeout)
455 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]:
File ~/miniconda3/lib/python3.10/concurrent/futures/_base.py:403, in Future.__get_result(self)
401 if self._exception:
402 try:
--> 403 raise self._exception
404 finally:
405 # Break a reference cycle with the exception in self._exception
406 self = None
File ~/miniconda3/lib/python3.10/concurrent/futures/thread.py:58, in _WorkItem.run(self)
55 return
57 try:
---> 58 result = self.fn(*self.args, **self.kwargs)
59 except BaseException as exc:
60 self.future.set_exception(exc)
File ~/Packages/api/mp_api/client/core/client.py:1010, in BaseRester._submit_request_and_process(self, url, verify, params, use_document_model, timeout)
1007 # other sub-urls may use different document models
1008 # the client does not handle this in a particularly smart way currently
1009 if self.document_model and use_document_model:
-> 1010 data["data"] = self._convert_to_model(data["data"])
1012 meta_total_doc_num = data.get("meta", {}).get("total_doc", 1)
1014 return data, meta_total_doc_num
File ~/Packages/api/mp_api/client/core/client.py:1046, in BaseRester._convert_to_model(self, data)
1036 def _convert_to_model(self, data: list[dict]):
1037 """Converts dictionary documents to instantiated MPDataDoc objects.
1038
1039 Args:
(...)
1044
1045 """
-> 1046 raw_doc_list = [self.document_model.model_validate(d) for d in data] # type: ignore
1048 if len(raw_doc_list) > 0:
1049 data_model, set_fields, _ = self._generate_returned_model(raw_doc_list[0])
File ~/Packages/api/mp_api/client/core/client.py:1046, in <listcomp>(.0)
1036 def _convert_to_model(self, data: list[dict]):
1037 """Converts dictionary documents to instantiated MPDataDoc objects.
1038
1039 Args:
(...)
1044
1045 """
-> 1046 raw_doc_list = [self.document_model.model_validate(d) for d in data] # type: ignore
1048 if len(raw_doc_list) > 0:
1049 data_model, set_fields, _ = self._generate_returned_model(raw_doc_list[0])
File ~/miniconda3/lib/python3.10/site-packages/pydantic/main.py:509, in BaseModel.model_validate(cls, obj, strict, from_attributes, context)
507 # `__tracebackhide__` tells pytest and some other tools to omit this function from tracebacks
508 __tracebackhide__ = True
--> 509 return cls.__pydantic_validator__.validate_python(
510 obj, strict=strict, from_attributes=from_attributes, context=context
511 )
ValidationError: 7 validation errors for MoleculeSummaryDoc
partial_charges.NONE.resp
Input should be a valid list [type=list_type, input_value={'property_id': '90e05663...9, 0.532428, -0.265717]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/list_type
partial_charges.NONE.mulliken
Input should be a valid list [type=list_type, input_value={'property_id': '26f3d39d...51, 0.01468, -0.094106]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/list_type
partial_charges.SOLVENT=WATER.resp
Input should be a valid list [type=list_type, input_value={'property_id': 'b684db14...2, -0.244941, 0.444549]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/list_type
partial_charges.SOLVENT=WATER.mulliken
Input should be a valid list [type=list_type, input_value={'property_id': 'c614aa0c...53, -0.740996, 0.28759]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/list_type
partial_spins.NONE.mulliken
Input should be a valid list [type=list_type, input_value={'property_id': '0bd15a66...86, 0.154226, 0.355101]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/list_type
partial_spins.SOLVENT=WATER.mulliken
Input should be a valid list [type=list_type, input_value={'property_id': '9d206e72...2941, 0.90596, 2.8e-05]}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/list_type
has_props
Input should be a valid list [type=list_type, input_value={'molecules': True, 'bond...True, 'vibration': True}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.6/v/list_type
Try using the officially supported mp_api client:
from mp_api.client import MPRester
with MPRester() as mpr:
docs = mpr.molecules.summary.search(nelements=7)
We generally recommend not mixing the pymatgen and mp_api clients.
Hi @tschaume,
I may be doing something wrong, but using that code I still get a ValidationError:
Which emmet-core version are you running? You might have to upgrade emmet-core since it contains the model definitions used for validation.
I was using emmet-core-0.84.2rc4, now have tried with the latest PyPI release (emmet-core-0.84.2), and still getting the same error
Thanks for checking @kavanase! We're in the middle of preparing a new data release that will hopefully fix it. We'll get back to you.
@kalvdans @kavanase The new data release v2024.11.14 is out and the mp-api library updated accordingly. Upgrading to mp-api==0.44.0 should fix the issue.