DNABERT icon indicating copy to clipboard operation
DNABERT copied to clipboard

How to divide our own dataset into test, dev and train data and assign them labels for fine tuning process

Open smruti241 opened this issue 1 year ago • 2 comments

Hi @jerryji1993 , @Zhihan1996 , @project-delphi , @hjgwak , @timlautk ,

I read your paper and its very interesting. I have a dataset which consists of 6-mers only. I want to divide my dataset into test, dev and train data and assign them labels for fine tuning process directly (no pre-training required, I will use pre-trained models). Can you please tell me the procedure or any script is available in the folders of this tool? Please let me know. Thanks!

smruti241 avatar Mar 08 '23 20:03 smruti241

Hi yes there is a way to load the models with HuggingFace I have done it in this repository: https://github.com/Moeinh77/Virus-DNA-Classification

Moeinh77 avatar Mar 20 '23 17:03 Moeinh77

@Moeinh77 can you please tell me how to use it? I didnt understand properly. I have kmer data already (6-mer data). I want to use pre-trained models for fine tuning. I dont have labels added in my kmer data

smruti241 avatar Mar 20 '23 18:03 smruti241