torchchem icon indicating copy to clipboard operation
torchchem copied to clipboard

Data structure for molecules

Open miaecle opened this issue 4 years ago • 3 comments

I noticed some existing code bases from previous work. Since we are heading towards using torch_geometric, which itself has pretty complete data structures for graphs, should we just directly use those? Otherwise I suppose we need to write codes to port our structures to torch_geometric compatible ones.

Also naming is a bit not intuitive: neural_fp.py is mostly on mol-graph data structures. I didn't see a ECFP (or other fingerprint) function. transformer.py defines many torch nn modules, many of which can be found in torch_geometric I believe.

I suggest organizing things the same way as deepchem? Such as dividing into subfolders for data manipulation, fingerprint, nn models, etc.

miaecle avatar Mar 18 '20 06:03 miaecle

@miaecle This seems like a good idea! Would you have any pytorch-geometric example code that does this?

Sorry, copied in code from a couple of sources as starters so it's all jumbled in. I have an open PR that cleans it up a bit into the deepchem structure. I'll go ahead and merge that in so the repo looks a little cleaner.

The neural_fp.py and transformer.py code are from https://github.com/gmum/MAT originally, so they haven't been reworked yet.

rbharath avatar Mar 18 '20 16:03 rbharath

@rbharath Sounds good, I can work on some basic data function to port things into torch_geometric.

miaecle avatar Mar 19 '20 09:03 miaecle

@miaecle Great! That would be a very valuable contribution :)

rbharath avatar Mar 19 '20 22:03 rbharath