Molecule generation model (GeoDiff)
Added a new model file to add functionality for this paper that does molecule generation via diffusion.
Pretrained models available are for two tasks, drugs and qm9:
model1 = DualEncoderEpsNetwork.from_pretrained("natolambert/geodiff-qm9-dualenc")
model1 = DualEncoderEpsNetwork.from_pretrained("natolambert/geodiff-drugs-dualenc")
Will work on additional examples for this model and update this PR. Some todo items:
- [x] setup colab dependencies to run model: is here
- [x] add model tests
- [x] update colab
- [ ] add documentation for model
- [x] rename model to
MoleculeGNN - [x] add dependency checks for
torch_geometric<2,pytorch 1.8, andtorch_scatter(a recommended installation method is in the colab)
Some comments:
- the GNN implementations came from ConfGF
- an issue to open after release will be to figure out how to port the existing models to the new version of pytorch geometric. The author is not sure if one can copy the parameter dict or if re-training will be needed. This will ease some of the specific requirements for this model.
Doing more digging, the model architecture is based on GIN and SchNet (common graph neural networks).
There are local and global parameters of the molecule, and different components use different GNNs.
Hey @natolambert,
This PR looks super cool. I think we can merge it quite quickly! Leaving some feedback directly in the code. Regarding the notebook, the API: In general, the notebook looks very nice IMO. If we have to install certain dependencies so be it! Regarding the model API I would suggest to change it a bit. E.g.:
# generate geometry with model, then filter it
model_outputs = model.forward(batch, t)
# this model uses additional conditioning of the outputs depending on the current timestep
epsilon = model.get_residual(pos, sigmas[t], model_outputs)["sample"]
Do you think we can merge this into a single forward pass e.g. something like:
epsilon = model(batch, t, sigma)["sample"]
?
I'll work through your comments soon @patrickvonplaten. CC the original author @MinkaiXu in case he has any time to look.
We should be able to add it now that things are calmer :partying_face:
@natolambert @clefourrier @patrickvonplaten any plan to integrate this PR?
I took a quick look and everything is pretty good (Thank you all for your efforts!) Ping me if there is any part I need to take a deeper look before merging the PR :)
Specifically thanks @natolambert a lot for discussing many details together!
@georgosgeorgos and @MinkaiXu, sorry for the delay on merging. We got a little distracted by Stable Diffusion. I'll plan on merging the updates on main to this + the notebook then we should be able to close the PR soon.
The colab should run in its current form! So if you look at that, happy to take comments!
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Fighting the stale bot! We haven't forgotten about this, and actually moved it up the priority list today. Soon!
Alright, I kind of messed up that rebase so there are some repeated commits in the log. The changes are all here. This is much closer to release now.
Unfortunately, there are some breaking issues right now. The primary one is that torch 1.8 (which is required to use the version of torch_geometric this was built on, does not support @torch.inference_mode, so importing the safety_checker.py fails.
This is something I don't really want to mess with. @patrickvonplaten or @MinkaiXu do you have any comments / suggestions to get by this (tagged for different reasons -- @patrickvonplaten taught me the version checking on input and @MinkaiXu knows those versions of torch).
When this is fixed, I will update the colab (which currently will error on install of diffusers).
I'm glad to help, but not quite familiar with the whole process --- safety_checker.py is some mandatory step for diffusers? If so, is there a source code for this file?
@MinkaiXu, @patrickvonplaten moved fast and removed it here. An interesting little difference I am not aware of :)
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.
@patrickvonplaten @anton-l @patil-suraj: so I realized the tests I implemented are not that useful because in order to test them you need:
- pytorch 1.8,
- torch_geometric 1.7.2
These are kind of a pain to install with conda install -c rusty1s pytorch-geometric=1.7.2 after installing pytorch==1.8 from source.
How should we think about integrating these tests?
Why do we need to install pytorch from source ? Also is torch_geometric not available through pip ?
@patil-suraj TLDR is re-implemented a research paper's code and it was made on a version before a lot of breaking changes. Has made this pr-for-a-colab a bit unwieldy.
I'm actually not sure it any version of torch_geometric is available on pip, I don't think so.
Do we already have a working google colab for this model. Also if a model is simply to difficult to implement, we might also only add the part that is easy to add to diffusers and for the rest just rely on an example
Oh yeah I can just put it all in the colab and not port it into diffusers. May be easier.
In summary, I would merge in sigmoid noise schedule and then just copy the model into the colab. Thoughts?
sounds good!
We closed this PR in favor of an colab-only solution unless someone has time to update the source model to the new versions of torch_geometric so it doesn't require very different dependencies!
Do we already have a working google colab for this model. Also if a model is simply to difficult to implement, we might also only add the part that is easy to add to diffusers and for the rest just rely on an example
@patrickvonplaten @natolambert where is the working colab with geodiff in the diffusers repo?
Let's leave it open until we have a colab version. We can also make use of community pipelines: https://huggingface.co/docs/diffusers/using-diffusers/custom_pipelines :-)
Created a PR in notebooks and the colab is here -- need to update the link once the PR in notebooks is merged.
Hey! Thanks for adding GeoDiff into the pipeline :D
TorsionDiff (https://arxiv.org/abs/2206.01729) might be another cool approach for the library. It requires fewer diffusion steps for conformational generation than GeoDiff!