deepchem
deepchem copied to clipboard
SequencesFeaturizer Folder Introduced also init changes
Pull Request Template
Description
With this PR , sequences featurizers are introduced into DeepChem Featurizers as a new type, which is specially focused on bioinformatics application. Other Featurizers focused on genomic and proteomics sequences should be added here.
Changes in documentation and init files for easy import were made.
Fix #(issue)
Type of change
Please check the option that is related to your PR.
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- In this case, we recommend to discuss your modification on GitHub issues before creating the PR
- [ ] Documentations (modification for documents)
Checklist
- [ ] My code follows the style guidelines of this project
- [ ] Run
yapf -i <modified file>
and check no errors (yapf version must be 0.22.0) - [ ] Run
mypy -p deepchem
and check no errors - [ ] Run
flake8 <modified file> --count
and check no errors - [ ] Run
python -m doctest <modified file>
and check no errors
- [ ] Run
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New unit tests pass locally with my changes
- [ ] I have checked my code and corrected any misspellings
Hi @rbharath ! I compile the documentation locally but in the method utransform of the sparsematrixonehotfeaturizer appears something of my local path related to the scipy.sparse https://docs.scipy.org/doc/scipy/reference/sparse.html
I do not know if it is because is a local compilation or if there is a problem with the typing of the function.
@tonydavis629 !! This small PR could help to make a better separation of new sequences featurizers that can be included and use for Bioinformatics. (SparseOneHotFeatruzer was specifically constructed to give more support to long datasets and can be used here. ) Other featurizer what work in sequences as the ones that you work with MSA can be classified here too.