[Bug]: No longer way to obtain charge density from task_id
Code snippet
# given some task id:
task_id = "mp-1805669"
# used to be able to do this
with MPRester(MP_API_KEY, monty_decode=deserialize) as mpr:
results = mpr.materials.search(task_ids=[task_id])
assert len(results) == 1
chgcar = mpr.charge_density.get_charge_density_from_file_id(results[0].fs_id)
What happened?
There is currently no way to obtain charge density given some task id. MPRester.charge_density no longer exists, and there is no get_charge_density_from_file_id() function within MPRester.
I noticed there is a TODO related to this as well.
A work around is to use the tasks._query_open_data function manually, though I'm not sure this is robust to future API changes. Although I am not 100% sure this is correct, since I don't think the task_id always matches the file path in the key below.
# given some task id:
task_id = "mp-1805669"
with MPRester(MP_API_KEY) as mpr:
decoder = MontyDecoder().decode
chgcar = (
mpr.tasks._query_open_data(
bucket="materialsproject-parsed",
key=f"chgcars/{str(task_id)}.json.gz",
decoder=decoder,
fields=["data"],
)[0][0]["data"]
or {}
)
Version
0.41.2
Which OS?
- [ ] MacOS
- [ ] Windows
- [X] Linux
Log output
No response
It looks like there is a soft guarantee there the task-id and Chgcar data match, so if they don't for any task there might be something wrong with a builder.
https://github.com/materialsproject/emmet/blob/a0ece971a1cc63cdb8fb6c6e463bbc3754d3c0db/emmet-core/emmet/core/charge_density.py#L20-L24
However, you can have a back up plan like: get the mp-id and look for the Chgcar that way if the task-id did not give a Chgcar object
If there is mismatch with the task-ids (Chgcar with a mismatched task_id) @tschaume might help.
Thanks @jmmshn, it appears that there sometimes might be a mismatch. For some extra context we had collected all of the available charge densities for training a machine learning model to predict charge density from atomic configuration. For reproducibility we stored all of the task_ids for each downloaded chgcar to ensure others could download the same data (see our mpid_to_task_id_map.json). If we try to download the chgcars again with the above code using key=f"chgcars/{str(task_id)}.json.gz", we get an error like:
unable to access bucket: 'materialsproject-parsed' key: 'chgcars/mp-1439955.json.gz' version: None error: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
for some of the task_ids (e.g mp-1439955 above), although many of them work fine.
Thanks for tagging me @jmmshn.
@teddykoker this issue is on our radar and we're working on getting it resolved with one of our next data releases. See this forum post for more. HTH.
Thanks @tschaume! Just to confirm -- the above NoSuchKey error is due to some API issue and not due to a mismatch of the task_id and file path?
I doubt it's a mismatch of task_id and object/file path. It's possible that the CHGCARs for the specific task_ids simply weren't properly transferred to the new repository we're using behind the scenes. Let me see if I can do a live patch now and get back to you.
Thank you @tschaume!
@teddykoker I added about 11.4k new CHGCARs to our data repository which should complete the set of CHGCARs expected to be there for the get_charge_density_from_material_id() MPRester function. That doesn't solve the issue of retrieving charge densities by task_id if the requested task doesn't correspond to the material's latest static calculation. For instance, for material mp-1 the latest static task is mp-2246557 and not mp-1439955. I'm trying to figure out if I can find and add those to the data repository which would allow you to retrieve them via _query_open_data().
Thanks again @tschaume. Should we in general expect old calculations to be available in the future? It would be helpful for the sake of reproducibility of works leveraging older calculations, but it is understandable that this could be more costly with respect to storage.
Regarding this Issue, would there be some desire for a MPRester.get_charge_density_from_task_id() function? This would probably just be a copy of these lines with the MPRester.get_charge_density_from_material_id() function calling this after finding the latest doc/task id. I would be happy to submit a PR for this or similar function if you'd like this added to the API.
We're in a transition right now that should enable us to make and keep older calculations available publicly going forward. I'll try to work on patching in as many old calculations and charge densities as I can but it might take some time and I can't guarantee completeness, unfortunately.
Yes, please submit a PR and tag me. Thank you!
Sounds good. Thank you for the effort in including the old calculations!
@teddykoker I merged your PR and released mp-api==0.42.1. I've been able to add a good chunk of charged densities to our data repo but there's still a good way to go.
Thanks! Let me know if there's anything else I can do to help.
@teddykoker I've uploaded about 150k additional CHGCARs for a total of 415k. I think this should at least guarantee that all currently used CHGCARs are available. Out of the 122689 in your list, 89723 are now available. I'll have to check if the remaining 32966 are lost or have simply been remapped to updated mp-ids.
Thanks for working on this! Is there a way on the user side to determine if a task id has been remapped?
@teddykoker I've uploaded about 150k additional CHGCARs for a total of 415k. I think this should at least guarantee that all currently used CHGCARs are available. Out of the 122689 in your list, 89723 are now available. I'll have to check if the remaining 32966 are lost or have simply been remapped to updated mp-ids.
Hi @tschaume, just following up regarding the missing task-ids. Is there any way to determine a mapping of old id -> new id if the missing task ids above have been updated? Thanks again for the help!