multitask-learning-protein-prediction
multitask-learning-protein-prediction copied to clipboard
Multitask learning: protein secondary structure prediction, b-values prediction and solvent-accessibility prediction
Descriptions
Multitask learning (secondary structure prediction, b-values prediction, solvent-accessibility prediction) can improve the prediction accuracy of protein secondary structure.
- We have to face with the class imbalance problem
- "foldername_cv": 5 fold cross validation
- Distribution of outputs:
Data
The copyright belongs to http://rostlab.org/. It can not be public.
Data representation
Using Protvec (3-gram) and follow the vector addition rule. For example:
TNCDE = UTN + TNC + NCD + CDE + DEU
Multitask learning model
Results
3 states protein secondary structure)
Multi-task learning (3 tasks, 3 states):
-
Secondary Structure accuracy (3 states): 69.0%
-
Solvent Accessibility accuracy (3 states): 54.6%
-
B-values accuracy (3 states): 59.1%
8 states protein secondary structure
Multi-task learning (3 tasks, 8 states):
-
Secondary Structure accuracy (8 states): 0.476
-
Solvent Accessibility accuracy (3 states): 0.548
-
B-values accuracy (3 states): 0.598
- Secondary structure
- Solvent accessibility
- b-values
Prerequisites
- python 2.7
- tensorflow 1.4.0
- ProtVec
How to run
Go into each subfolder and run the code following:
- python lstm.py
Author
Binh Do
License
This project is licensed under the MIT License