diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Molecule generation model (GeoDiff)

Open natolambert opened this issue 3 years ago • 23 comments

Added a new model file to add functionality for this paper that does molecule generation via diffusion.

Pretrained models available are for two tasks, drugs and qm9:

model1 = DualEncoderEpsNetwork.from_pretrained("natolambert/geodiff-qm9-dualenc")
model1 = DualEncoderEpsNetwork.from_pretrained("natolambert/geodiff-drugs-dualenc")

Will work on additional examples for this model and update this PR. Some todo items:

  • [x] setup colab dependencies to run model: is here
  • [x] add model tests
  • [x] update colab
  • [ ] add documentation for model
  • [x] rename model to MoleculeGNN
  • [x] add dependency checks for torch_geometric<2, pytorch 1.8, and torch_scatter (a recommended installation method is in the colab)

Some comments:

  • the GNN implementations came from ConfGF
  • an issue to open after release will be to figure out how to port the existing models to the new version of pytorch geometric. The author is not sure if one can copy the parameter dict or if re-training will be needed. This will ease some of the specific requirements for this model.

natolambert avatar Jun 30 '22 22:06 natolambert

Doing more digging, the model architecture is based on GIN and SchNet (common graph neural networks).

There are local and global parameters of the molecule, and different components use different GNNs.

natolambert avatar Jul 22 '22 22:07 natolambert

Hey @natolambert,

This PR looks super cool. I think we can merge it quite quickly! Leaving some feedback directly in the code. Regarding the notebook, the API: In general, the notebook looks very nice IMO. If we have to install certain dependencies so be it! Regarding the model API I would suggest to change it a bit. E.g.:

# generate geometry with model, then filter it
model_outputs = model.forward(batch, t)

# this model uses additional conditioning of the outputs depending on the current timestep
epsilon = model.get_residual(pos, sigmas[t], model_outputs)["sample"]

Do you think we can merge this into a single forward pass e.g. something like:

epsilon = model(batch, t, sigma)["sample"]

?

patrickvonplaten avatar Jul 25 '22 10:07 patrickvonplaten

I'll work through your comments soon @patrickvonplaten. CC the original author @MinkaiXu in case he has any time to look.

natolambert avatar Jul 25 '22 14:07 natolambert

We should be able to add it now that things are calmer :partying_face:

patrickvonplaten avatar Aug 23 '22 17:08 patrickvonplaten

@natolambert @clefourrier @patrickvonplaten any plan to integrate this PR?

georgosgeorgos avatar Sep 08 '22 12:09 georgosgeorgos

I took a quick look and everything is pretty good (Thank you all for your efforts!) Ping me if there is any part I need to take a deeper look before merging the PR :)

Specifically thanks @natolambert a lot for discussing many details together!

MinkaiXu avatar Sep 08 '22 19:09 MinkaiXu

@georgosgeorgos and @MinkaiXu, sorry for the delay on merging. We got a little distracted by Stable Diffusion. I'll plan on merging the updates on main to this + the notebook then we should be able to close the PR soon.

The colab should run in its current form! So if you look at that, happy to take comments!

natolambert avatar Sep 08 '22 20:09 natolambert

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Oct 03 '22 15:10 github-actions[bot]

Fighting the stale bot! We haven't forgotten about this, and actually moved it up the priority list today. Soon!

natolambert avatar Oct 03 '22 15:10 natolambert

Alright, I kind of messed up that rebase so there are some repeated commits in the log. The changes are all here. This is much closer to release now.

Unfortunately, there are some breaking issues right now. The primary one is that torch 1.8 (which is required to use the version of torch_geometric this was built on, does not support @torch.inference_mode, so importing the safety_checker.py fails.

This is something I don't really want to mess with. @patrickvonplaten or @MinkaiXu do you have any comments / suggestions to get by this (tagged for different reasons -- @patrickvonplaten taught me the version checking on input and @MinkaiXu knows those versions of torch).

When this is fixed, I will update the colab (which currently will error on install of diffusers).

natolambert avatar Oct 03 '22 19:10 natolambert

I'm glad to help, but not quite familiar with the whole process --- safety_checker.py is some mandatory step for diffusers? If so, is there a source code for this file?

MinkaiXu avatar Oct 03 '22 21:10 MinkaiXu

@MinkaiXu, @patrickvonplaten moved fast and removed it here. An interesting little difference I am not aware of :)

natolambert avatar Oct 03 '22 21:10 natolambert

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@patrickvonplaten @anton-l @patil-suraj: so I realized the tests I implemented are not that useful because in order to test them you need:

  • pytorch 1.8,
  • torch_geometric 1.7.2

These are kind of a pain to install with conda install -c rusty1s pytorch-geometric=1.7.2 after installing pytorch==1.8 from source.

How should we think about integrating these tests?

natolambert avatar Oct 03 '22 22:10 natolambert

Why do we need to install pytorch from source ? Also is torch_geometric not available through pip ?

patil-suraj avatar Oct 05 '22 10:10 patil-suraj

@patil-suraj TLDR is re-implemented a research paper's code and it was made on a version before a lot of breaking changes. Has made this pr-for-a-colab a bit unwieldy.

I'm actually not sure it any version of torch_geometric is available on pip, I don't think so.

natolambert avatar Oct 05 '22 14:10 natolambert

Do we already have a working google colab for this model. Also if a model is simply to difficult to implement, we might also only add the part that is easy to add to diffusers and for the rest just rely on an example

patrickvonplaten avatar Oct 07 '22 13:10 patrickvonplaten

Oh yeah I can just put it all in the colab and not port it into diffusers. May be easier.

In summary, I would merge in sigmoid noise schedule and then just copy the model into the colab. Thoughts?

natolambert avatar Oct 07 '22 15:10 natolambert

sounds good!

patrickvonplaten avatar Oct 07 '22 17:10 patrickvonplaten

We closed this PR in favor of an colab-only solution unless someone has time to update the source model to the new versions of torch_geometric so it doesn't require very different dependencies!

natolambert avatar Oct 07 '22 20:10 natolambert

Do we already have a working google colab for this model. Also if a model is simply to difficult to implement, we might also only add the part that is easy to add to diffusers and for the rest just rely on an example

@patrickvonplaten @natolambert where is the working colab with geodiff in the diffusers repo?

georgosgeorgos avatar Oct 10 '22 08:10 georgosgeorgos

Let's leave it open until we have a colab version. We can also make use of community pipelines: https://huggingface.co/docs/diffusers/using-diffusers/custom_pipelines :-)

patrickvonplaten avatar Oct 10 '22 13:10 patrickvonplaten

Created a PR in notebooks and the colab is here -- need to update the link once the PR in notebooks is merged.

natolambert avatar Oct 10 '22 17:10 natolambert

Hey! Thanks for adding GeoDiff into the pipeline :D

TorsionDiff (https://arxiv.org/abs/2206.01729) might be another cool approach for the library. It requires fewer diffusion steps for conformational generation than GeoDiff!

code: https://github.com/gcorso/torsional-diffusion

rish-16 avatar Feb 15 '23 09:02 rish-16