question about the allowed bond valence in stability check
Thank you for the great implementation of flow matching in molecule generation!
I have a question regarding the allowed bond valence settings in the code.
Specifically, in this part of the code, I noticed that:
- A neutral carbon (zero charge) is allowed to have a valence of 3.
- A neutral nitrogen (zero charge) is allowed to have a valence of 2.
From a chemistry perspective, these configurations do not seem stable. Could you clarify the reasoning behind these valence choices? Further, after I removed choice 3 in the carbon valence check, I found the molecular stability dropped a lot, from 99.3% to 77.6%.
Post-edit: the 77.6% is obtained under the following allowed bonds:
allowed_bonds = {'H': {0: 1, 1: 0, -1: 0},
'C': {0: 4, 1: 3, -1: 3},
'N': {0: [3], 1: [4], -1: 2},
'O': {0: 2, 1: 3, -1: 1},...
Nice catch! So the reason its in our code is that we took the valency look-up table from MiDi. Here's how I think these strange valencies got there:
It turns out there is a bug in how valency is calculated that I believe started in MiDi but has propagated to several subsequent works, ours included. This bug was found by Filipp Nikitin ( @FilippNikitin ) and he's raised github issues explaining it in a few places:
- https://github.com/cvignac/MiDi/issues/12
- https://github.com/tuanle618/eqgat-diff/issues/2
- https://github.com/rssrwn/semla-flow/issues/4
Under this bug, a neutral, aromatic carbon that has two aromatic bonds and a single bond (like a carbon in a benzene), would be counted as having a valency of 3 (even though the correct valency is 4). A neutral SP2 nitrogen participating in an aromatic ring (such as pyridine) would be counted as having a valency of 2 (the correct valency is 3).
So, my guess is that the valency lookup table was constructed using valencies found in the data, but where the valencies were computed with the bug described in those github issues.
The reason the stability metric drops so much when you remove the valence-3 carbon and valence-2 nitrogen is because the valency computation and look-up table are interdependent (the look-up table is designed for buggy valency computation).
I'm working towards releasing a third (and hopefully final) iteration of FlowMol and I plan on fixing the valency lookup table in it.
Thank you for your prompt response and detailed explanation!
I understand that the dtype issue could contribute to the performance drop when I modified the allowed bond dictionary. However, I found that most of the unstable molecules—such as those where carbon has three valences—are not due to their involvement in aromatic systems.
I checked the unstable molecules generated by the qm9_ctmc model and noticed that many of them exhibit this issue, despite clearly not being part of aromatic systems:
I cannot evaluate whether this is simply how the model behaves or if there is something going wrong in your environment from just a few cherry-picked examples.
Starting from the main branch, if I set the lookup table to be this:
allowed_bonds = {'H': {0: 1, 1: 0, -1: 0},
'C': {0: [4], 1: 3, -1: 3},
'N': {0: [3], 1: [2, 3, 4], -1: 2}, # In QM9, N+ seems to be present in the form NH+ and NH2+
'O': {0: 2, 1: 3, -1: 1},
'F': {0: 1, -1: 0},
'B': 3, 'Al': 3, 'Si': 4,
'P': {0: [3, 5], 1: 4},
'S': {0: [2, 6], 1: [2, 3], 2: 4, 3: 5, -1: 3},
'Cl': 1, 'As': 3,
'Br': {0: 1, 1: 2}, 'I': 1, 'Hg': [1, 2], 'Bi': [3, 5], 'Se': [2, 4, 6]}
and run the following in a jupyter notebook:
import flowmol
from flowmol.analysis.metrics import SampleAnalyzer
model = flowmol.load_pretrained('qm9_ctmc').cuda().eval()
analyzer = SampleAnalyzer()
sampled_mols = model.sample_random_sizes(n_molecules=1000)
analyzer.analyze(sampled_mols)
I get the following output:
{'frac_atoms_stable': 0.9787860276559116,
'frac_mols_stable_valence': 0.856,
'frac_valid_mols': 0.991,
'avg_frag_frac': 0.9999565217391305,
'avg_num_components': 1.001}
The stability I see here (86%) is different than the stability you quoted in your first post so there could be something different about your environment or how you're running inference.
Those numbers above^ still use the incorrect valency calculation, so stable atoms in aromatic systems would in theory be marked as unstable. If I fix the valency computation and then re-run the cell above I get the following output:
{'frac_atoms_stable': 0.9911190053285968,
'frac_mols_stable_valence': 0.87,
'frac_valid_mols': 0.993,
'avg_frag_frac': 1.0,
'avg_num_components': 1.0}
These results are virtually the same as the first set of results. This would suggest that participation in aromatic rings is not a significant factor here (not surprising as I think there are no or virtually no aromatic bonds in QM9). So the gap from 99.3% in the paper to ~86% seen here is probably due to ~13% of molecules having neutral 3-valence carbons or neutral 2-valence nitrogens.
So the current situation is:
- neutral 3-valence carbon and neutral 2-valence nitrogen appear in the valid valence lookup table likely due to a bug in valency computation
- but the presence of these entries in the valid valence lookup table also obscures instances where the model is genuinely failing to produce valid valencies
Sorry for not mentioning that I actually also changed the allowed bond valence for nitrogen when its charge is +1. I think if the charge is +1, then the allowed valence should only be 4, rather than [2, 3, 4].
By the way, from my perspective, I think H-, C-, and C+ are actually not stable.
So the current situation is:
- neutral 3-valence carbon and neutral 2-valence nitrogen appear in the valid valence lookup table likely due to a bug in valency computation
- but the presence of these entries in the valid valence lookup table also obscures instances where the model is genuinely failing to produce valid valencies
I agree on these points. But I think the atom-type diffusion process may also influence it, since in many cases if we change carbon to nitrogen, then the valence will be correct. Meanwhile, the geometry of unstable carbon also seems to be more like nitrogen.
So the current situation is:
- neutral 3-valence carbon and neutral 2-valence nitrogen appear in the valid valence lookup table likely due to a bug in valency computation
- but the presence of these entries in the valid valence lookup table also obscures instances where the model is genuinely failing to produce valid valencies
I agree on these points. But I think the atom-type diffusion process may also influence it, since in many cases if we change carbon to nitrogen, then the valence will be correct. Meanwhile, the geometry of unstable carbon also seems to be more like nitrogen.
Hi,
I'm also re-processing QM9 and GEOM. These 2 datasets contains super weird charges, e.g., a benzene ring with 6 C+/C-. I'm now re-processing it. The allowed charges are:
- O -1
- N +1 if not in aromatic ring
- N +1 if in aromatic ring and with valence 4
Besides, can @Dunni3 you provide the original data split indices for GEOM? Since I'm re-processing it, I want to keep the same split as you and MiDi.
Please help.
Thanks
We're hosting the raw files here. The directions in the readme point to this same location. These files are the starting point for our pipeline (process_geom.py). I will note that these are not the true raw files from the original GEOM dataset. I make this point because I don't know precisely how the authors of MiDi made these files so its possible they did something more than just copying over rdkit molecule objects.
I suggest setting the "valid valencies" look-up table to be a table that you create by parsing the type/charge/valency combinations observed in the training data. This is the most accurate way to measure if the model is producing something that is outside of its training distribution.
We're hosting the raw files here. The directions in the readme point to this same location. These files are the starting point for our pipeline (
process_geom.py). I will note that these are not the true raw files from the original GEOM dataset. I make this point because I don't know precisely how the authors of MiDi made these files so its possible they did something more than just copying over rdkit molecule objects.I suggest setting the "valid valencies" look-up table to be a table that you create by parsing the type/charge/valency combinations observed in the training data. This is the most accurate way to measure if the model is producing something that is outside of its training distribution.
Thanks. I find the process code from MiDi https://github.com/cvignac/MiDi/blob/775b731c38967a1e49615b2ad70ac6b5db24909a/midi/datasets/geom_preprocessing.py#L13
However, I don't know how is the rdkit_mols.pickle merged from seperate files. I think it may not be possible to reproduce the same data split. What do you think of I re-split with same split ratio? Is taht fair enough?
Thanks
We recently wrote a paper about issues with the "stability" metric commonly used, including the one being discussed here. We propose a few solutions and provide some code for them.
I also fixed the valency computation in FlowMol3; the third version of FlowMol that will be imminently released. So I'm going to mark this issue as resolved and close it.
Our new paper is here:
Nikitin, F., Dunn, I., Koes, D. R. & Isayev, O. GEOM-Drugs Revisited: Toward More Chemically Accurate Benchmarks for 3D Molecule Generation. Preprint at https://doi.org/10.48550/arXiv.2505.00169 (2025).
And the code can be found here: https://github.com/isayevlab/geom-drugs-3dgen-evaluation