atomate2 icon indicating copy to clipboard operation
atomate2 copied to clipboard

Change how electrode flow retrieves charge density data

Open esoteric-ephemera opened this issue 1 year ago • 16 comments

Small change in the electrode insertion workflow to avoid needing large object storage: Currently there's a get_charge_density_job function which does not point its output to e.g. GridFS. That causes problems when trying to store a CHGCAR in normal mongo stores

I've moved the get_charge_density_job within get_inserted_structures so that the charge density is not stored when generating inserted structures (it still may be stored in the user's GridFS from the static calc)

Another solution would be just adding a data=[Chgcar] kwarg to the get_charge_density_job function, but I think it makes more sense to remove get_charge_density_job

esoteric-ephemera avatar Nov 12 '24 22:11 esoteric-ephemera

@esoteric-ephemera does it make sense to add the data to tge datastore for other parts of the workflow? It will surely fail at some point

JaGeo avatar Nov 12 '24 22:11 JaGeo

It should work in this setup because the charge density is only used in one part of the flow (get_inserted_structures). If the user wanted to retrieve their charge densities for analysis, the host structure static calculation should store them in the datastore as well (this static is required in the electrode flow)

Maybe @jmmshn has thoughts on this - don't want to break the flow logic

esoteric-ephemera avatar Nov 13 '24 17:11 esoteric-ephemera

@esoteric-ephemera, copying the file over was kind of a fireworks-related hack. When this was added, whenever there was a large object that was the input of the job. The UI that rendered the fireworks wf would try to render the entire thing which basically crashed it every time. So you might want to confirm that this is no longer the case.

jmmshn avatar Nov 14 '24 18:11 jmmshn

Thanks for the suggestions @janosh, wasn't aware of the type hinting for Callable!

Thanks, @jmmshn. This is probably a safer approach then, since get_inserted_structures no longer takes VolumetricData as input, just the path to the previous calc and a function to parse it

esoteric-ephemera avatar Nov 15 '24 00:11 esoteric-ephemera

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 4.13%. Comparing base (4244da9) to head (59b7f26).

Files with missing lines Patch % Lines
src/atomate2/common/jobs/electrode.py 0.00% 3 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #1055       +/-   ##
==========================================
- Coverage   72.82%   4.13%   -68.69%     
==========================================
  Files         187     187               
  Lines       13637   13634        -3     
  Branches     1370    1370               
==========================================
- Hits         9931     564     -9367     
- Misses       3161   13039     +9878     
+ Partials      545      31      -514     
Files with missing lines Coverage Δ
src/atomate2/common/jobs/electrode.py 0.00% <0.00%> (-84.54%) :arrow_down:

... and 166 files with indirect coverage changes

codecov[bot] avatar Nov 25 '24 19:11 codecov[bot]

Thanks for the suggestions @janosh, wasn't aware of the type hinting for Callable!

Thanks, @jmmshn. This is probably a safer approach then, since get_inserted_structures no longer takes VolumetricData as input, just the path to the previous calc and a function to parse it

@esoteric-ephemera , it looks like current version of get_charge_density_job

https://github.com/materialsproject/atomate2/blob/95ea0600e00bcded2a0cb6cfe6d190fa6a980c39/src/atomate2/common/jobs/electrode.py#L303

breaks for large objects because it need to send the entire charge density to the jobstore. I think this can be fixed by adding data=True but we will run into the same problem with jobflow + Firework, since you now have the following charge density in the input:

https://github.com/materialsproject/atomate2/blob/95ea0600e00bcded2a0cb6cfe6d190fa6a980c39/src/atomate2/common/jobs/electrode.py#L97C9-L97C16

So I don't see any way of not causing the problem with fireworks where the object is deserialized during viewing of the job in the webgui. Maybe it's best to move on completely from fireworks anyways.

So I think the choices are:

  1. Go back to the old implementation where the charge density is never "sent" between jobs so it never appears as in input.
  2. Just forget about any problems with view in fireworks... I have not used it in months and month and we should just worry about jobflow-remote.

I think we are already down the path of 2 mostly anyways so I can just complete refactor out the get_charge_density_job and send the charge density from static calculation directly.

jmmshn avatar Jan 07 '25 21:01 jmmshn

Hey @jmmshn, the way I've rewritten it in the PR shouldn't hit either issue you've identified - it just slightly tweaks the electrode flow to take the path of the charge density file and a function as args. Should not have issues with storing large objects in Mongo, nor with rendering large objects in the fireworks webgui

We're still using fireworks, and I don't see a strong case to store the charge density in a database. Even in that case, the user can write a custom get_charge_density function that pulls from a database directly

esoteric-ephemera avatar Jan 07 '25 23:01 esoteric-ephemera

So in my testing the following job fails with a "Document too large" error.

https://github.com/materialsproject/atomate2/blob/95ea0600e00bcded2a0cb6cfe6d190fa6a980c39/src/atomate2/common/jobs/electrode.py#L303-L318

My current understanding of what is happening is that:

get_charge_density_job calls the VASP version of the function:

https://github.com/materialsproject/atomate2/blob/95ea0600e00bcded2a0cb6cfe6d190fa6a980c39/src/atomate2/vasp/flows/electrode.py#L58-L73

So it will produced a document that contains a VolumetricData/Chgcar object.

It looks like the fact that this result does not have a data=[Chgcar, VolumetricData] in the decorator is the reason I'm seeing the failures.

PS: Thanks for the fast reply!

jmmshn avatar Jan 07 '25 23:01 jmmshn

Yeah exactly what I've observed - adding Chgcar and VolumetricData is to the data field of job should work as well, but is there a need for storing the charge density?

esoteric-ephemera avatar Jan 08 '25 01:01 esoteric-ephemera

but is there a need for storing the charge density? I think that is for the case where the different nodes cannot see the same file system? As I understand, you will have to store the charge density somewhere to be able to send it to the next job. That might be a somewhat common usage case if people use more heterogenous compute which is where I think jobflowremote is going.

That is essentially the tradeoff, this is a substantial amount of new data though so it might make sense to make

  • "copy file over" the default behavior
  • "send object via jobstore" an optional behavior

jmmshn avatar Jan 08 '25 15:01 jmmshn

OK - so in the case of heterogeneous compute for a VASP job, we'd have to use something like static_job.output.vasp_objects["aeccar0"] to access the AECCAR0 without making a direct query to the job store. I haven't tried to see if that's possible but I would expect it is?

Either way, I think this is worth looking at in a separate PR - the flow as it currently exists in atomate2 doesn't support heterogeneous compute, this PR is really just to ensure that the flow doesn't fail from mongodb object size limitations

esoteric-ephemera avatar Jan 09 '25 18:01 esoteric-ephemera

OK everything makes sense now. I agree. I think I was a bit confused cuz I thought the thing was already removed and somehow made it's way back in. But I I now see that this PR was not merged. Sorry for the confusion on my part!

jmmshn avatar Jan 09 '25 23:01 jmmshn

@esoteric-ephemera, I added some minor fixes to the job naming/numbering and I try help get this PR passed.

Thanks for fixing this and shelping me clear up my confusion. I think I made a mistaking while writing the original wf and missed a place where the data is serialized to the jobstore. I'll fix those among other things once this PR is merged!

jmmshn avatar Jan 10 '25 06:01 jmmshn

No worries @jmmshn, that all sounds good to me

esoteric-ephemera avatar Jan 10 '25 16:01 esoteric-ephemera

Hi @esoteric-ephemera, we fixed the abinit tests, I think this should work now.

jmmshn avatar Jan 28 '25 16:01 jmmshn

Hey @JaGeo , @janosh , @utf , should be ready for review. Thanks @jmmshn for input on this!

esoteric-ephemera avatar Jan 28 '25 19:01 esoteric-ephemera

@JaGeo would you mind taking a quick look over the docs changes and see if that's what you had in mind for #1151 and #1152?

The openff tests seem to be OK now with the newer dependency stack, but I marked a few places with TODOs that will have to eventually be updated. Whenever you get a chance @orionarcher, would be super helpful if you could look these over

esoteric-ephemera avatar Apr 24 '25 04:04 esoteric-ephemera

Thank you @esoteric-ephemera ! Those updates are great! (as usual).

JaGeo avatar Apr 24 '25 04:04 JaGeo