VASP Drone and `onsite_density_matrix` causing large document sizes
I have seen an example of a calculation (~200 atoms, ~50 SCF steps) where the task document size goes past 16 MB -- the vast majority of this due to the onsite_density_matrix in the OUTCAR.
Creating this issue to keep an eye on it. Possibilities are (1) a bug in parsing the matrix, (2) a sub-optimal representation of the matrix, (3) the possibility we shouldn't be storing this regardless except for the last SCF step. I have not had an opportunity to investigate further yet, if anyone wants a test file let me know.
Thanks @mkhorton.
This is actually something we had to deal with in emmet-cli recently: https://github.com/materialsproject/emmet/blob/e30cbf2d6856d51dd7149ee253c4eb1ea969ddc9/emmet-cli/emmet/cli/utils.py#L394
I agree it would be better to handle this in the drone directly. Do you know of any potential uses for the onsite_density_matrix data? As in, is there any downside to always removing it?
@acrutt brought this to my attention, we can share the example file privately if it's helpful.
I don't think this is data we'd commonly need... I think I'm actually to blame for this, I added the parsing to the Outcar two years ago, though I can't recall the context now.
In the example file, it ends up being a list of dicts (15504 elements) keyed by spin (+1, -1).
I think we could probably safely remove the key from the drone, and probably the way this data is represented could improved at a later date in pymatgen, because I think the current representation of the data is basically 1-to-1 equivalent of how it's stored in the Outcar, except as a list of dicts, and I don't think this is very sensible.