biotrainer issues

Improve inference testing

Predictions for a secondary structure model ([dataset](https://github.com/J-SNACKKB/FLIP/tree/main/splits/secondary_structure)) should about match those from the [prottrans paper](https://ieeexplore.ieee.org/document/9477085). This could also be used to create a new test for the inferencer module with...

SebieF

good first issue

testing

:woman_scientist: Add LoRA layers to fine-tune protein language models for embeddings calculation

5

After migrating from [bio_embeddings](https://github.com/sacdallago/bio_embeddings) to calculate embeddings directly in biotrainer for the provided sequences, it is now theoretically possible to allow for fine-tuning existing protein language models (pLMs) such as...

SebieF

enhancement

refactoring

[ppi] Concat does not work for 0-dimensional (scalar) tensors

For the protein-protein interaction mode, singular values can't be concatenated by `torch.concat`. A reshaping like `embedding1.reshape(1)` would be necessary.

SebieF

bug

Create tutorial how-to use a custom embedder

It would be nice to have a tutorial how to use custom embedders with biotrainer. This way, new protein language models can be used directly in biotrainer without having to...

SebieF

documentation

[ppi] Interaction mode not compatible with all protocols yet

The ppi interaction mode is not yet compatible with all protocols yet. `sequence_to_class` have been tested throughout. Other per-sequence protocols should work as well. However, for per-residue tasks (`residue_to_class`), changes...

SebieF

enhancement

Adding BERT model and protocol = transformer encoder model + masked language modeling (MLM)

This is a very worthwhile effort. Are you considering adding the BERT transformer encoder model and the associated masked language modeling task for pre-training? The task is actually the same...

prihoda

Support multiple hyperparameters for hold_out cross validation

After the cross_validation PR will be merged, parameter search for nested cross validation will be enabled. It would be nice to extend this behaviour also to hold_out cross validation. A...

SebieF

enhancement

good first issue

Add random comparison baseline

As a researcher, it would be nice to have an automatic random baseline as a comparison for every run. This could be included in the final test metrics: `test set...

SebieF

enhancement

good first issue

BatchNorm1D does not work with batches of size 1

1

The LightAttention model used for residues_to_class protocol uses BatchNorm1D. However, if using a batch size of 1 is not possible with BatchNorm1D. Because a batch size of 1 is an...

SebieF

bug

wontfix

Config - Embeddings - Targets pipeline is inefficient

Currently, at first the config file is loaded (but not completely sanity checked yet, for example biotrainer does not care if the input files actually exist, so embeddings might be...

SebieF

refactoring

biotrainer
biotrainer copied to clipboard

Metadata

Improve inference testing

:woman_scientist: Add LoRA layers to fine-tune protein language models for embeddings calculation

[ppi] Concat does not work for 0-dimensional (scalar) tensors

Create tutorial how-to use a custom embedder

[ppi] Interaction mode not compatible with all protocols yet

Adding BERT model and protocol = transformer encoder model + masked language modeling (MLM)

Support multiple hyperparameters for hold_out cross validation

Add random comparison baseline

BatchNorm1D does not work with batches of size 1

Config - Embeddings - Targets pipeline is inefficient

← Metadata

Owner

Metadata

biotrainer biotrainer copied to clipboard

Metadata

← Metadata

Owner

Metadata

biotrainer
biotrainer copied to clipboard