diffusers Molecule generation model (GeoDiff)

Added a new model file to add functionality for this paper that does molecule generation via diffusion.

Pretrained models available are for two tasks, drugs and qm9:

model1 = DualEncoderEpsNetwork.from_pretrained("natolambert/geodiff-qm9-dualenc")
model1 = DualEncoderEpsNetwork.from_pretrained("natolambert/geodiff-drugs-dualenc")

Will work on additional examples for this model and update this PR. Some todo items:

[x] setup colab dependencies to run model: is here
[x] add model tests
[x] update colab
[ ] add documentation for model
[x] rename model to MoleculeGNN
[x] add dependency checks for torch_geometric<2, pytorch 1.8, and torch_scatter (a recommended installation method is in the colab)

Some comments:

the GNN implementations came from ConfGF
an issue to open after release will be to figure out how to port the existing models to the new version of pytorch geometric. The author is not sure if one can copy the parameter dict or if re-training will be needed. This will ease some of the specific requirements for this model.

Jun 30 '22 22:06 natolambert

Doing more digging, the model architecture is based on GIN and SchNet (common graph neural networks).

There are local and global parameters of the molecule, and different components use different GNNs.

Jul 22 '22 22:07 natolambert

Hey @natolambert,

This PR looks super cool. I think we can merge it quite quickly! Leaving some feedback directly in the code. Regarding the notebook, the API: In general, the notebook looks very nice IMO. If we have to install certain dependencies so be it! Regarding the model API I would suggest to change it a bit. E.g.:

# generate geometry with model, then filter it
model_outputs = model.forward(batch, t)

# this model uses additional conditioning of the outputs depending on the current timestep
epsilon = model.get_residual(pos, sigmas[t], model_outputs)["sample"]

Do you think we can merge this into a single forward pass e.g. something like:

epsilon = model(batch, t, sigma)["sample"]

?

Jul 25 '22 10:07 patrickvonplaten

I'll work through your comments soon @patrickvonplaten. CC the original author @MinkaiXu in case he has any time to look.

Jul 25 '22 14:07 natolambert

We should be able to add it now that things are calmer :partying_face:

Aug 23 '22 17:08 patrickvonplaten

@natolambert @clefourrier @patrickvonplaten any plan to integrate this PR?

Sep 08 '22 12:09 georgosgeorgos

I took a quick look and everything is pretty good (Thank you all for your efforts!) Ping me if there is any part I need to take a deeper look before merging the PR :)

Specifically thanks @natolambert a lot for discussing many details together!

Sep 08 '22 19:09 MinkaiXu

@georgosgeorgos and @MinkaiXu, sorry for the delay on merging. We got a little distracted by Stable Diffusion. I'll plan on merging the updates on main to this + the notebook then we should be able to close the PR soon.

The colab should run in its current form! So if you look at that, happy to take comments!

Sep 08 '22 20:09 natolambert

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Oct 03 '22 15:10 github-actions[bot]

Fighting the stale bot! We haven't forgotten about this, and actually moved it up the priority list today. Soon!

Oct 03 '22 15:10 natolambert

Alright, I kind of messed up that rebase so there are some repeated commits in the log. The changes are all here. This is much closer to release now.

Unfortunately, there are some breaking issues right now. The primary one is that torch 1.8 (which is required to use the version of torch_geometric this was built on, does not support @torch.inference_mode, so importing the safety_checker.py fails.

This is something I don't really want to mess with. @patrickvonplaten or @MinkaiXu do you have any comments / suggestions to get by this (tagged for different reasons -- @patrickvonplaten taught me the version checking on input and @MinkaiXu knows those versions of torch).

When this is fixed, I will update the colab (which currently will error on install of diffusers).

Oct 03 '22 19:10 natolambert

I'm glad to help, but not quite familiar with the whole process --- safety_checker.py is some mandatory step for diffusers? If so, is there a source code for this file?

Oct 03 '22 21:10 MinkaiXu

@MinkaiXu, @patrickvonplaten moved fast and removed it here. An interesting little difference I am not aware of :)

Oct 03 '22 21:10 natolambert

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Oct 03 '22 22:10 HuggingFaceDocBuilderDev

@patrickvonplaten @anton-l @patil-suraj: so I realized the tests I implemented are not that useful because in order to test them you need:

pytorch 1.8,
torch_geometric 1.7.2

These are kind of a pain to install with conda install -c rusty1s pytorch-geometric=1.7.2 after installing pytorch==1.8 from source.

How should we think about integrating these tests?

Oct 03 '22 22:10 natolambert

Why do we need to install pytorch from source ? Also is torch_geometric not available through pip ?

Oct 05 '22 10:10 patil-suraj

@patil-suraj TLDR is re-implemented a research paper's code and it was made on a version before a lot of breaking changes. Has made this pr-for-a-colab a bit unwieldy.

I'm actually not sure it any version of torch_geometric is available on pip, I don't think so.

Oct 05 '22 14:10 natolambert

Do we already have a working google colab for this model. Also if a model is simply to difficult to implement, we might also only add the part that is easy to add to diffusers and for the rest just rely on an example

Oct 07 '22 13:10 patrickvonplaten

Oh yeah I can just put it all in the colab and not port it into diffusers. May be easier.

In summary, I would merge in sigmoid noise schedule and then just copy the model into the colab. Thoughts?

Oct 07 '22 15:10 natolambert

sounds good!

Oct 07 '22 17:10 patrickvonplaten

We closed this PR in favor of an colab-only solution unless someone has time to update the source model to the new versions of torch_geometric so it doesn't require very different dependencies!

Oct 07 '22 20:10 natolambert

Do we already have a working google colab for this model. Also if a model is simply to difficult to implement, we might also only add the part that is easy to add to diffusers and for the rest just rely on an example

@patrickvonplaten @natolambert where is the working colab with geodiff in the diffusers repo?

Oct 10 '22 08:10 georgosgeorgos

Let's leave it open until we have a colab version. We can also make use of community pipelines: https://huggingface.co/docs/diffusers/using-diffusers/custom_pipelines :-)

Oct 10 '22 13:10 patrickvonplaten

Created a PR in notebooks and the colab is here -- need to update the link once the PR in notebooks is merged.

Oct 10 '22 17:10 natolambert

Hey! Thanks for adding GeoDiff into the pipeline :D

TorsionDiff (https://arxiv.org/abs/2206.01729) might be another cool approach for the library. It requires fewer diffusion steps for conformational generation than GeoDiff!

code: https://github.com/gcorso/torsional-diffusion

Feb 15 '23 09:02 rish-16