graphein icon indicating copy to clipboard operation
graphein copied to clipboard

[WIP] Add FASTA Dataset class

Open a-r-j opened this issue 2 years ago • 1 comments

Reference Issues/PRs

Waiting on #272

What does this implement/fix? Explain your changes

Dataset class for working with Sequence Datasets. Provides utilities for batch folding and embedding with ESM(Fold).

  • [ ] Set representative structure. For protein engineering tasks we can have a setup where we predict a single WT structure, which we use as the structure for the mutants & simply appropriately modify the residue types.

  • [] FoldComp compression of the predicted structures. Ideally this would run in the ESMFold step, but we can also do it post-hoc.

What testing did you do to verify the changes in this PR?

Pull Request Checklist

  • [ ] Added a note about the modification or contribution to the ./CHANGELOG.md file (if applicable)
  • [ ] Added appropriate unit test functions in the ./graphein/tests/* directories (if applicable)
  • [ ] Modify documentation in the corresponding Jupyter Notebook under ./notebooks/ (if applicable)
  • [ ] Ran python -m py.test tests/ and make sure that all unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., python -m py.test tests/protein/test_graphs.py)
  • [ ] Checked for style issues by running black . and isort .

a-r-j avatar Mar 29 '23 21:03 a-r-j

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

sonarqubecloud[bot] avatar Mar 29 '23 21:03 sonarqubecloud[bot]