DNABERT icon indicating copy to clipboard operation
DNABERT copied to clipboard

sequences with multiple labels

Open Joseph-Vineland opened this issue 3 years ago • 0 comments

The Basset dataset has DNA sequences each with 164 binary labels. I would like to fine-tune DNABERT with this dataset. However, DNABERT is only built for sequences to have 1 label each. Is it possible to modify DNABERT so it can perform fine-tuning on data with multiple labels?

I know I will have to edit src/transformers/data/processors/glue.py and src/transformers/data/processors/utils.py I'm lost after that. Any help would be appreciated. Thank you.

I want DNABERT to have more than one output neuron, I will have a go at modifying the code myself and share if I can hack it.

Joseph-Vineland avatar Jun 17 '21 01:06 Joseph-Vineland