atomate icon indicating copy to clipboard operation
atomate copied to clipboard

VASP Drone and `onsite_density_matrix` causing large document sizes

Open mkhorton opened this issue 4 years ago • 2 comments

I have seen an example of a calculation (~200 atoms, ~50 SCF steps) where the task document size goes past 16 MB -- the vast majority of this due to the onsite_density_matrix in the OUTCAR.

Creating this issue to keep an eye on it. Possibilities are (1) a bug in parsing the matrix, (2) a sub-optimal representation of the matrix, (3) the possibility we shouldn't be storing this regardless except for the last SCF step. I have not had an opportunity to investigate further yet, if anyone wants a test file let me know.

mkhorton avatar Jan 27 '21 18:01 mkhorton

Thanks @mkhorton.

This is actually something we had to deal with in emmet-cli recently: https://github.com/materialsproject/emmet/blob/e30cbf2d6856d51dd7149ee253c4eb1ea969ddc9/emmet-cli/emmet/cli/utils.py#L394

I agree it would be better to handle this in the drone directly. Do you know of any potential uses for the onsite_density_matrix data? As in, is there any downside to always removing it?

utf avatar Jan 27 '21 18:01 utf

@acrutt brought this to my attention, we can share the example file privately if it's helpful.

I don't think this is data we'd commonly need... I think I'm actually to blame for this, I added the parsing to the Outcar two years ago, though I can't recall the context now.

In the example file, it ends up being a list of dicts (15504 elements) keyed by spin (+1, -1).

I think we could probably safely remove the key from the drone, and probably the way this data is represented could improved at a later date in pymatgen, because I think the current representation of the data is basically 1-to-1 equivalent of how it's stored in the Outcar, except as a list of dicts, and I don't think this is very sensible.

mkhorton avatar Jan 27 '21 19:01 mkhorton