openprotein icon indicating copy to clipboard operation
openprotein copied to clipboard

Keys in hdf5

Open maverick0004 opened this issue 5 years ago • 2 comments

Hi, nice work done here. I wanted to ask that in after pre processing raw data to hdf5 file there were primary, mask and tertiary keys so this means the model training only looks at amino acid sequence but according to AlQuraishi's paper shouldn't the input be amino acid sequence + PSSM ?

maverick0004 avatar Jun 14 '19 15:06 maverick0004

Hey @maverick0004! Correct, currently this only uses the amino acid sequence. However, since the PSSM data it is in the ProteinNet data set it should be quick to include it in the hdf5/model :) Relevant code parsing the ProteinNet format is here https://github.com/OpenProtein/openprotein/blob/master/preprocessing.py#L53

JeppeHallgren avatar Jun 15 '19 11:06 JeppeHallgren

@JeppeHallgren So if just taking the amino acid sequence as input aren't the predictions less accurate than the ones using sequence + PSSM as done by AlQuraishi ?

maverick0004 avatar Jun 17 '19 08:06 maverick0004