[Bug]: `chemsys` parameter not returning all entries in `chemsys`
Code snippet
# new API:
with MPRester("new_MP_API_key") as mpr:
Cd_Te_PD_docs = mpr.materials.summary.search(chemsys="Cd-Te")
print(len(Cd_Te_PD_docs)) # should be 23, returns 11
What happened?
Related to the bug report in https://github.com/materialsproject/api/issues/859
Using the chemsys parameter in the search() method (specifically for mpr.materials.summary.search here but this affects all BaseRester.search() methods), not all entries in the specified chemical system are actually being returned.
Correct behaviour with legacy API:
But with the new API and setting chemsys = "Cd-Te", only Cd_xTe_y entries are returned, and not Cd or Te-only entries (which should also be returned):
Can test this for other material systems too, by just comparing the returned docs from summary.search(chemsys="X-Y") with the output of searching "X-Y" on the Materials Project website.
Version
v0.41.2
Which OS?
- [X] MacOS
- [ ] Windows
- [ ] Linux
Log output
No response
And also just to clarify, get_entries_in_chemsys for the new MPRester also works fine and as expected, it's just summary.search() that has the issue:
I've found the issue, it's that in get_entries_in_chemsys(), it uses this code to convert the input chemsys parameter to a format that works for the summary.search() methods:
https://github.com/materialsproject/api/blob/2c13d6a515972503b9b52447df6986974c5affd5/mp_api/client/mprester.py#L1179C9-L1187C60
if isinstance(elements, str):
elements = elements.split("-")
elements_set = set(elements) # remove duplicate elements
all_chemsyses = []
for i in range(len(elements_set)):
for els in itertools.combinations(elements_set, i + 1):
all_chemsyses.append("-".join(sorted(els)))
I think for consistency it would be best for this handling to be applied for all uses of chemsys, not just in this function? So the handling of chemsys across all the functions (and as used on the Materials Project website) is consistent?
The only issue is there needs to be a way to query for a single chemical system, and not just always query for what in this case turns out to be a list (e.g. ["Cd-Te", "Cd", "Te"]. Since the original rester class had the get_entries_in_chemsys method which queried for ALL entries in the parent chemical system as well as all of it's subsystems, we elected to keep it that way. We still will want to maintain the existing query functionality of summary.search. Perhaps this is an issue that can be fixed in the docs.
Ok! Yeah then maybe just worth clarification in the docs/docstrings? Just because "chemical system" on the MP website and in the old methods meant everything in that chemical space (i.e. "Cd-Te" -> CdxTey where x/y are any combination of integers including zero) rather than the current where it means everything with that exact same combination of elements (but with variable stoichiometry); i.e. where x/y must be > 0
Of course the responsibility lies with MP to make sure documentation is correct, but adding a note that the public docs accept PRs too if there is ever a quick change to suggest that might help provide clarification for other users: https://github.com/materialsproject/public-docs
I confess I would have also found this confusing. I suspect the underlying reason is because get_entries_in_chemsys is often used to quickly obtain necessary data to create a PhaseDiagram object, so is also useful to include subsystems.
Have changed the language of the docstring. Feel free to submit a PR if you feel it isn't enough @kavanase.
Have changed the language of the docstring.
For reference, this was https://github.com/materialsproject/api/commit/801c05c3f11329bfd0946a8515cc057902c483d4
@mkhorton @kalvdans I've shortened the intro page in our docs and added a prominent note asking the community for help improving the docs if needed.