biotite icon indicating copy to clipboard operation
biotite copied to clipboard

Add support for structural alphabets

Open padix-key opened this issue 5 months ago • 0 comments

Structural alphabets are a fusion of structure and sequence methods and can greatly benefit from the already implemented functionality in Biotite. In summary they tokenize each residue into some symbol from an alphabet of limited size. The resulting sequence can than be input to sequence-based methods. Support for the following structural alphabets is planned:

  • [x] Protein Blocks (#676)
  • [ ] ~CLePAPS (#681)~ (on hold due to inconsistency with reference implementation)
  • [x] 3Di (#665)

Furthermore the following tasks need to be done for all of them:

  • [x] Add common undefined_symbol for all structural alphabets
  • [x] Add benchmark for each method
  • [x] Move their test modules in tests/structure/alphabet
  • [x] Generate color schemes for their subsitution matrices with gecos and update color_schemes example
  • [ ] Add docstring for biotite.structure.alphabet
  • [ ] Add tutorial for structural alphabets
  • [ ] Add at least one example
  • [x] Mention structural alphabets in the Sequence section on the home page

padix-key avatar Sep 15 '24 07:09 padix-key