bonito
bonito copied to clipboard
Question about training dataset for pre-trained model
Hi, I'm trying to reproduce the pre-trained bonito model. However, with the same model architecture and the dataset provided by using: bonito download --training, my models seem to converge with accuracy lower approximately 0.5~0.8% than the pre-trained model. I'm wondering was the pre-trained model trained only with the provided dataset? Thank you.
According to: https://github.com/nanoporetech/bonito/issues/69 (end of the issue)
The pre-trained dataset is trained on other data.
Isn't the dataset now consisting of 1221470 reads? Thanks.
Ah, then they updated the dataset. My answer is invalid then.
Where is stored the reads data after running bonito download --training
? I was expecting a .fast5 as a result but I just got some .npy
files
@danwarrior see https://github.com/nanoporetech/bonito/issues/4#issuecomment-562876731
@Howard-Liang no, the available training set does not fully represent that used for the pre-trained models.