graphein
graphein copied to clipboard
[WIP] Add FASTA Dataset class
Reference Issues/PRs
Waiting on #272
What does this implement/fix? Explain your changes
Dataset class for working with Sequence Datasets. Provides utilities for batch folding and embedding with ESM(Fold).
-
[ ] Set representative structure. For protein engineering tasks we can have a setup where we predict a single WT structure, which we use as the structure for the mutants & simply appropriately modify the residue types.
-
[] FoldComp compression of the predicted structures. Ideally this would run in the ESMFold step, but we can also do it post-hoc.
What testing did you do to verify the changes in this PR?
Pull Request Checklist
- [ ] Added a note about the modification or contribution to the
./CHANGELOG.mdfile (if applicable) - [ ] Added appropriate unit test functions in the
./graphein/tests/*directories (if applicable) - [ ] Modify documentation in the corresponding Jupyter Notebook under
./notebooks/(if applicable) - [ ] Ran
python -m py.test tests/and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g.,python -m py.test tests/protein/test_graphs.py) - [ ] Checked for style issues by running
black .andisort .
Kudos, SonarCloud Quality Gate passed! 
0 Bugs
0 Vulnerabilities
0 Security Hotspots
2 Code Smells
No Coverage information
0.0% Duplication