openprotein icon indicating copy to clipboard operation
openprotein copied to clipboard

soft_to_angle theory

Open TrentBrick opened this issue 5 years ago • 4 comments

Would it be possible to share a link to the research or reasoning behind the soft_to_angle Module for someone new to structural protein problems?

My current hunch is that you have run a mixture model on the pfam database and found the average angle conformations of the different families. You then use a LogSoftmax activation function to allow each amino acid to choose which of the omega, psi and phi angles it wants from this table of options. You then take these values and use sin, cos and arctan to convert them into angles? Why does the mixture model have 500 clusters, how was the mixture_model table generated, and why is there a 90:10 pos/neg omega ratio that is then randomly shuffled in?

Again I am a noob so pointers to any papers or other grounded reasoning for this approach would be really appreciated.

TrentBrick avatar Apr 18 '19 23:04 TrentBrick

Along similar lines in preprocessing.py you take the ProteinNet tertiary data which is in coordinate format, and then convert it into angles and then back to coordinates again. Why?

Starting from line 132 angles, batch_sizes = calculate_dihedral_angles_over_minibatch(pos, [len(prim)], use_gpu=use_gpu) tertiary, _ = get_backbone_positions_from_angular_prediction(angles, batch_sizes, use_gpu=use_gpu) tertiary = tertiary.squeeze(1)

TrentBrick avatar Apr 19 '19 14:04 TrentBrick

Any further insight on this would be really appreciated!

TrentBrick avatar Apr 20 '19 21:04 TrentBrick

For inspiration to model design, you're probably best off by reading https://www.cell.com/cell-systems/fulltext/S2405-4712(19)30076-6 In preprocessing.py we're currently converting to angles and back-again to ensure the distance between amino acids is exactly the ones we use the pnerf module. Going from coordinations -> angles -> coordinates should give back exactly the same coordinates. However, the original coordinates (measured) can contain some noise, so this is essentially a preprocessing step to remove it.

JeppeHallgren avatar Apr 22 '19 19:04 JeppeHallgren

Thanks, I read this paper a while ago and didn't remember there being the right side of figure 2 with the "torsional alphabet", it may have been added in a later edition.

I still don't see any information about using a mixture model or in the RGN github repo any actual mixture model angles (you have three different files for these). Did you generate these yourself or correspond with AlQuraishi to get them?

And the preprocessing.py noise removal makes sense.

TrentBrick avatar Apr 22 '19 20:04 TrentBrick