biotrainer
biotrainer copied to clipboard
Predictions for a secondary structure model ([dataset](https://github.com/J-SNACKKB/FLIP/tree/main/splits/secondary_structure)) should about match those from the [prottrans paper](https://ieeexplore.ieee.org/document/9477085). This could also be used to create a new test for the inferencer module with...
After migrating from [bio_embeddings](https://github.com/sacdallago/bio_embeddings) to calculate embeddings directly in biotrainer for the provided sequences, it is now theoretically possible to allow for fine-tuning existing protein language models (pLMs) such as...
For the protein-protein interaction mode, singular values can't be concatenated by `torch.concat`. A reshaping like `embedding1.reshape(1)` would be necessary.
It would be nice to have a tutorial how to use custom embedders with biotrainer. This way, new protein language models can be used directly in biotrainer without having to...
The ppi interaction mode is not yet compatible with all protocols yet. `sequence_to_class` have been tested throughout. Other per-sequence protocols should work as well. However, for per-residue tasks (`residue_to_class`), changes...
This is a very worthwhile effort. Are you considering adding the BERT transformer encoder model and the associated masked language modeling task for pre-training? The task is actually the same...
After the cross_validation PR will be merged, parameter search for nested cross validation will be enabled. It would be nice to extend this behaviour also to hold_out cross validation. A...
As a researcher, it would be nice to have an automatic random baseline as a comparison for every run. This could be included in the final test metrics: `test set...
The LightAttention model used for residues_to_class protocol uses BatchNorm1D. However, if using a batch size of 1 is not possible with BatchNorm1D. Because a batch size of 1 is an...
Currently, at first the config file is loaded (but not completely sanity checked yet, for example biotrainer does not care if the input files actually exist, so embeddings might be...