Protein-feature-extraction
                                
                                 Protein-feature-extraction copied to clipboard
                                
                                    Protein-feature-extraction copied to clipboard
                            
                            
                            
                        Python code to extract features from Protein sequences for Machine Learning/Deep Learning
Protein Feature Extraction for Machine Learning
Python code to extract features from Protein sequences for Machine Learning/Deep Learning
Protein feature extraction is carried out using Biopython package
 Format:
Format: 
Features (27 features):
- AA-count (20x features)
- aromaticity (1x)
- secondary_structure_fraction (3x)
- isoelectric_point (1x)
- molecular_weight (1x)
- instability_index (1x)
Packages required (other than built-in) for the execution of code... -Pandas -pickle -Biopython -subprocess
Top N features for identifying Insuliin protein sequence
 Format:
Format: 
Installation
For windows
Windows users have to specify the path to fasta files and output folder in linux style of referencing directory using / slash rather than \
eg C:/folder_name/file_name.fasta
This issue will be fixed in future updates
pip install discere
For linux
pip3 install discere
Usage
  import discere.discere as di
  
  di.extract_feature('./Documents/positive_training.fasta', 
                     './Documents/negative_training.fasta', 
                     './Documents')
di.extract_feature(input_file1, input_file2, output_directory)
output
Outputs are stored in user_specified_path/output in .txt, .arff and .csv formats