DNABERT
DNABERT copied to clipboard
sequences with multiple labels
The Basset dataset has DNA sequences each with 164 binary labels. I would like to fine-tune DNABERT with this dataset. However, DNABERT is only built for sequences to have 1 label each. Is it possible to modify DNABERT so it can perform fine-tuning on data with multiple labels?
I know I will have to edit src/transformers/data/processors/glue.py
and src/transformers/data/processors/utils.py
I'm lost after that. Any help would be appreciated. Thank you.
I want DNABERT to have more than one output neuron, I will have a go at modifying the code myself and share if I can hack it.