keras-neural-graph-fingerprint
keras-neural-graph-fingerprint copied to clipboard
Bond representation
In the neural graph fingerprint paper, and implementation, the summed features for a node are a concatenation of the summed atom features of the neighbours, and the summed bond features of the respective bonds.
In Duvenauds implementation, the summed bond features are stored in a separate matrix, in order to speed up computation.
This is memory efficient, but from the graph perspective, inadequate.
There are only a few bond types, and I want to experiment with bond-type-dependent weights (rather than degree-dependent, as is currently suggested). In order to play around this this, the NeuralGraphLayers need to be able to access the bond-type information per bond (and the summed information is no longer sufficient).
There are two obvious ways to represent bond-type infomation in the current framework:
- Bond-type neighbour matrix There are only a few discrete bond types, so a possible solution would be an
atom x atommatrix for each molecule, with integer values that represent the bond type (0 for no bond). A downside of this is that it will be harder to extend the bond features (e.g. include distance). - Bond-feature neighbour tensor An extension of the bond-type neighbour matrix could be a
atom x atom x featuretensor, that stores the bond features for each bond.
A downside of both approaches is that it is subobtimal for the regular Duvenaud algorihtm. For each hidden layer and batch, the neighbouring atom features will have to be summed. This computation is exactely the same for each layer. On the other side, perhaps theano otimisation will figure this out (?).
Both representations will also require more memory, especially the Bond-feature neighbour tensor. In these graphs the edges are undirected, so the bond feature information will take up two times as much space as required. I will have to run benchmarks to see what is acceptable.